i have also gone through the technologies available for development of ocr,from that i think apache tika and tessearact is best for resolving the problem.
On Sat, Jul 13, 2013 at 9:02 PM, sandeep rg <[email protected]> wrote: > hi Mattamann Chris, > i has participated in the event coordinated by luciano resende > > http://community.apache.org/mentoringprogramme-icfoss-pilot.html > > and from that i learned about open source and like to work on your project > ctakes.i would like to fix the jira > > https://issues.apache.org/jira/browse/CTAKES-189 > > chen pei accepted my requested to be my mentor.now i want to give a > proposal to apache about the project i am going to work on.can you help me > to prepare a proposal to be submitted before 18 th of this july. > > > > > > > On Sat, Jul 13, 2013 at 2:26 AM, Mattmann, Chris A (398J) < > [email protected]> wrote: > >> Hi Sandeep, >> >> I think the best thing to do is: >> >> 1. Develop a JIRA issue here: >> https://issues.apache.org/jira/browse/CTAKES >> 1a. you can register for a new account on JIRA >> 2. Once your JIRA issue is created, feel free to start a [DISCUSS] thread >> (e.g., with subject [DISCUSS] "some topic" where "some topic" is perhaps >> the main idea you have) on [email protected], referencing your issue >> and >> asking for feedback >> 3. Work with the Apache cTAKES PMC and committers to get your patches and >> other items attached to your issue from #1 committed into the sources >> >> Ideally if 1-3 happen and it's a good interaction, Apache is built on >> meritocracy and you could possibly earn the merit to become a PMC member >> or committer on the project. >> >> Cheers, >> Chris >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> Chris Mattmann, Ph.D. >> Senior Computer Scientist >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >> Office: 171-266B, Mailstop: 171-246 >> Email: [email protected] >> WWW: http://sunset.usc.edu/~mattmann/ >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> Adjunct Assistant Professor, Computer Science Department >> University of Southern California, Los Angeles, CA 90089 USA >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >> >> >> >> >> >> -----Original Message----- >> From: sandeep rg <[email protected]> >> Reply-To: "[email protected]" <[email protected]> >> Date: Thursday, July 11, 2013 11:30 AM >> To: "[email protected]" <[email protected]> >> Subject: Re: to involve in your development group >> >> >can you provide what all details i should include in a proposal?whether i >> >wanted to include all implemetation(technical) details in the proposal? >> > >> > >> >On Thu, Jul 11, 2013 at 9:45 PM, Mattmann, Chris A (398J) < >> >[email protected]> wrote: >> > >> >> Dear Sandeep, >> >> >> >> Thanks for your interest in cTAKES. We would welcome your contribution >> >> and are happy to have your interest in the project. >> >> >> >> Cheers, >> >> Chris >> >> >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >> Chris Mattmann, Ph.D. >> >> Senior Computer Scientist >> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >> >> Office: 171-266B, Mailstop: 171-246 >> >> Email: [email protected] >> >> WWW: http://sunset.usc.edu/~mattmann/ >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >> Adjunct Assistant Professor, Computer Science Department >> >> University of Southern California, Los Angeles, CA 90089 USA >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >> >> >> >> >> >> >> >> >> >> >> >> >> -----Original Message----- >> >> From: sandeep rg <[email protected]> >> >> Reply-To: "[email protected]" <[email protected]> >> >> Date: Wednesday, July 10, 2013 11:01 AM >> >> To: "[email protected]" <[email protected]> >> >> Subject: Re: to involve in your development group >> >> >> >> >sir, >> >> > >> >> >My name is sandeep rg.i am a btech graduate in computer science.now >> >>doing >> >> >an internship in a company in java language. >> >> > >> >> >then i had installed all things succesfully,now downloading the >> >> >resource.ittake too much time. >> >> > >> >> >i have gone through the suggested ocr technologies. >> >> >Javaocr has some good user review. >> >> >Apache tika has a capability to process different types of format. >> >> >More than that there is tesserract which are also used for ocr >> purpose. >> >> >then apache pdfbox is also used for text extratcion but only for pdf >> >> >files. >> >> >now i am going through every thing to find out best technology from >> >>this. >> >> > >> >> > >> >> >On Wed, Jul 10, 2013 at 12:52 AM, Chen, Pei >> >> ><[email protected]>wrote: >> >> > >> >> >> Hi Sandeep, >> >> >> I am delighted to work with you on this project. >> >> >> >> >> >> I was not sure if I understood you correctly- did you mean to say >> >>that >> >> >>you >> >> >> have already tried using cTAKES and it's components? >> >> >> If not, you can do an svn checkout of the code and try running the >> >> >> debugger gui from the command line (or eclipseide) that will allow >> >>you >> >> >>to >> >> >> type in plain text and get back the different structured content >> >>(types) >> >> >> that cTAKES produces: >> >> >> MAVEN_OPTS="-Xmx2g -Xms1g" >> >> >> mvn -PrunCVD compile >> >> >> From the guide: >> >> >> >> >> >> >> >> >> >> >> https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.0+Developer+I >> >> >>nstall+Guide >> >> >> >> >> >> A bit of background: >> >> >> Apache cTAKES uses SVN for version on control: >> >> >> https://svn.apache.org/repos/asf/ctakes/trunk/ >> >> >> Jira for issues tracking: >> >> >> https://issues.apache.org/jira/browse/ctakes >> >> >> Maven for building and dependency management. >> >> >> A lot of the developers use Eclipse IDE for their development. >> >> >> More info on ctakes.apache.org >> >> >> >> >> >> cTAKES is built on top of the Apache UIMA Framework. Essentially, >> >> >>cTAKES >> >> >> is a collection of Annotators (Java Classes) and wired together to >> >>into >> >> >>a >> >> >> pipeline. >> >> >> It's goal in a nutshell is to turn unstructured plain text into >> >> >> structured/normalized form and specially trained for medical notes. >> >> >> Right now- the input cTAKES expects would be in plain text form and >> >> >>cTAKES >> >> >> does not have an OCR component. >> >> >> cTAKE-189:GSoC:implement OCR/tika to standardize text inputs was an >> >>idea >> >> >> to allow cTAKES to take in any type of input (PDF, Images, Word, >> XLS, >> >> >>etc.) >> >> >> and pass the text for cTAKES processing. >> >> >> [I was originally thinking this could be done in some kind of >> >> >> preprocessing, or an optional Annotator that could be added in the >> >> >> beginning of a pipeline]. There may be some existing work that >> >>could be >> >> >> potentially reused: Apache Tika ( >> >> >> https://issues.apache.org/jira/browse/TIKA-93 ) as well as some >> open >> >> >> source OCR toolkits (JavaOCR). >> >> >> >> >> >> About Me: >> >> >> >> >> >> >> >> >> >> >> >> >> >> http://childrenshospital.org/cfapps/research/data_admin/Site3240/mainpage >> >> >>S3240P8.html >> >> >> http://www.linkedin.com/in/peistation >> >> >> http://people.apache.org/committer-index.html#chenpei >> >> >> >> >> >> > -----Original Message----- >> >> >> > From: sandeep rg [mailto:[email protected]] >> >> >> > Sent: Tuesday, July 09, 2013 1:19 PM >> >> >> > To: [email protected] >> >> >> > Subject: Re: to involve in your development group >> >> >> > >> >> >> > Thanks a lot for giving me support.i like to work with you. >> >> >> > >> >> >> > I have gone through the objectives of the software,used the >> >>software >> >> >>and >> >> >> > gone through various components of the project.can you provide me >> >> >> starting >> >> >> > point from where i should start to know more about the coding part >> >>of >> >> >>the >> >> >> > project. >> >> >> > >> >> >> > can you tell me more about the project and about you also? >> >> >> > >> >> >> > >> >> >> > On Tue, Jul 9, 2013 at 1:14 AM, Chen, Pei >> >> >> > <[email protected]>wrote: >> >> >> > >> >> >> > > Hi Sandeep, >> >> >> > > Thank you for the interest. I just had a quick look at the >> >>ICFOSS >> >> >> > > pilot mentoring program and will be happy to serve as a mentor >> >>for >> >> >> > > your project >> >> >> > > proposal(s) if you are interested. >> >> >> > > >> >> >> > > --Pei >> >> >> > > >> >> >> > > > -----Original Message----- >> >> >> > > > From: sandeep rg [mailto:[email protected]] >> >> >> > > > Sent: Monday, July 08, 2013 2:24 PM >> >> >> > > > To: [email protected] >> >> >> > > > Subject: Re: to involve in your development group >> >> >> > > > >> >> >> > > > sir, >> >> >> > > > >> >> >> > > > details of the program Pilot mentoring programme with india >> >>ICFOSS >> >> >> > > > is >> >> >> > > given >> >> >> > > > in the below web address >> >> >> > > > >> >> >> > > > >> >>http://community.apache.org/mentoringprogramme-icfoss-pilot.html >> >> >> > > > >> >> >> > > > >> >> >> > > > I am new to this community so i need a mentor for the >> >>project.It >> >> >> > > > will be >> >> >> > > more >> >> >> > > > helpful for me.. >> >> >> > > > >> >> >> > > > >> >> >> > > > On Mon, Jul 8, 2013 at 7:22 PM, Chen, Pei >> >> >> > > > <[email protected]>wrote: >> >> >> > > > >> >> >> > > > > Hi Sandeep, >> >> >> > > > > Welcome! I am not familiar with the details of >> >>icfoss-apache, >> >> >>but >> >> >> > > > > please- you are more than welcome to work on the code and >> >> >> > > > > contributions will be greatly appreciated! >> >> >> > > > > There may be a learning curve, but feel free let us know if >> >>you >> >> >> > > > > have any questions/issues. >> >> >> > > > > Thanks, >> >> >> > > > > Pei >> >> >> > > > > >> >> >> > > > > > -----Original Message----- >> >> >> > > > > > From: sandeep rg [mailto:[email protected]] >> >> >> > > > > > Sent: Saturday, July 06, 2013 11:50 AM >> >> >> > > > > > To: [email protected] >> >> >> > > > > > Subject: to involve in your development group >> >> >> > > > > > >> >> >> > > > > > my name is sandeep.i am btech graduate.i had participated >> >>in >> >> >>a >> >> >> > > > > > camp coordinated in kerala,India in association with >> >> >> > > > > > icfoss-apache called as >> >> >> > > > > youth >> >> >> > > > > > mentoring programme coordinated by Luciano resende. >> >> >> > > > > > >> >> >> > > > > > i like the project >> >>and >> >> >> > > > > > like to >> >> >> > > > > involve in your project as a >> >> >> > > > > > programmer.i have gone through the your project and gone >> >> >>through >> >> >> > > > > > the bugs list.I like to work on the bug >> >> >> > > > > > "cTAKE-189:GSoC:implement OCR/tika to standardize text >> >>inputs >> >> >> > > > > > for cTAKES".can you allow me to >> >> >> > > work >> >> >> > > > on that? >> >> >> > > > > >> >> >> > > >> >> >> >> >> >> >> >> >> >
