Hi Sandeep, I'll try and review this today.
Cheers, Chris ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -----Original Message----- From: sandeep rg <sandeep.f...@gmail.com> Reply-To: "dev@ctakes.apache.org" <dev@ctakes.apache.org> Date: Monday, July 22, 2013 7:04 AM To: "dev@ctakes.apache.org" <dev@ctakes.apache.org> Subject: Re: to involve in your development group >sir, > i have gone through some of the medical record such as bills,patient >details etc. most of them are printed using dot matrix printer,which is >very hard to extract such type text from scanned images.i have done >testing >with some professional software such as abbyy fine reader which also given >a poor output. > >but sir i have the confidence to do it.but i need more knowledge about >image processing capabilities.so can you suggest any one who is good in >image processing programming in your team? > > >On Thu, Jul 18, 2013 at 1:22 AM, sandeep rg <sandeep.f...@gmail.com> >wrote: > >> i hava done sequence diagram and done some small changes,please go >>through >> it and tell me if any more thing is to be included >> >> >> On Wed, Jul 17, 2013 at 9:37 PM, sandeep rg >><sandeep.f...@gmail.com>wrote: >> >>> it just a skeleton of original proposal >>> >>> >>> On Wed, Jul 17, 2013 at 9:31 PM, sandeep rg >>><sandeep.f...@gmail.com>wrote: >>> >>>> the sample work is shared with you both.any more details to be >>>>included >>>> please tell me. >>>> In which,GUI design,schedule and implementation flow chart design is >>>>to >>>> added which is under construction and will be uploaded within few >>>>hours. >>>> >>>> >>>> On Wed, Jul 17, 2013 at 7:56 PM, Chen, Pei < >>>> pei.c...@childrens.harvard.edu> wrote: >>>> >>>>> pei.stat...@gmail.com >>>>> >>>>> > -----Original Message----- >>>>> > From: Mattmann, Chris A (398J) >>>>>[mailto:chris.a.mattm...@jpl.nasa.gov] >>>>> > Sent: Wednesday, July 17, 2013 10:22 AM >>>>> > To: dev@ctakes.apache.org >>>>> > Subject: Re: to involve in your development group >>>>> > >>>>> > chris.mattm...@gmail.com >>>>> > >>>>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>> > ++++++++ >>>>> > Chris Mattmann, Ph.D. >>>>> > Senior Computer Scientist >>>>> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >>>>> > Office: 171-266B, Mailstop: 171-246 >>>>> > Email: chris.a.mattm...@nasa.gov >>>>> > WWW: http://sunset.usc.edu/~mattmann/ >>>>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>> > ++++++++ >>>>> > Adjunct Assistant Professor, Computer Science Department >>>>>University of >>>>> > Southern California, Los Angeles, CA 90089 USA >>>>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>> > ++++++++ >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > -----Original Message----- >>>>> > From: sandeep rg <sandeep.f...@gmail.com> >>>>> > Reply-To: "dev@ctakes.apache.org" <dev@ctakes.apache.org> >>>>> > Date: Wednesday, July 17, 2013 6:53 AM >>>>> > To: "dev@ctakes.apache.org" <dev@ctakes.apache.org> >>>>> > Subject: Re: to involve in your development group >>>>> > >>>>> > >can you provide your gmail id to share the proposal document with >>>>> you? >>>>> > > >>>>> > > >>>>> > > >>>>> > >On Tue, Jul 16, 2013 at 11:33 PM, sandeep rg >>>>><sandeep.f...@gmail.com >>>>> > >>>>> > >wrote: >>>>> > > >>>>> > >> sir, >>>>> > >> i am providing proposal by two days.now i am mainly going >>>>>through >>>>> > >>ASF-ICFOSS gateway because if i gone through their way and my >>>>> proposal >>>>> > >>is get selected,ICFOSS will provide some sort of support such as >>>>> > >>certificates,small financial support etc. to us. >>>>> > >> >>>>> > >> >>>>> > >> but,main thing is i like programming,i like to explore through >>>>>the >>>>> > >> new technologies in coding and like to interact with the >>>>>coding.so >>>>> if >>>>> > >> my proposal is got rejected,then also i like to work in your >>>>> project >>>>> > >> as a volunteer if you allow me.. >>>>> > >> >>>>> > >> now i am preparing a proposal,within 2 days i will submit >>>>> > >> it..Mattmann chris helped me to know more about the format of >>>>> > proposal. >>>>> > >> >>>>> > >> >>>>> > >> On Tue, Jul 16, 2013 at 8:12 PM, Chen, Pei >>>>> > >><pei.c...@childrens.harvard.edu >>>>> > >> > wrote: >>>>> > >> >>>>> > >>> Chris/Sandeep, >>>>> > >>> According to ASF-ICFOSS, I believe the deadline for submitting >>>>> > >>>proposals is this coming Friday (July 19). >>>>> > >>> After which point, mentors will have 2 weeks to review and >>>>> > >>>score/accept. >>>>> > >>> Just curious, are we planning to follow the same process here? >>>>> Or >>>>> > >>>since it's all volunteer work, technically- sandeep and still >>>>> > >>>contribute code to the community and participate in the dev >>>>>group >>>>> > >>>here. >>>>> > >>> >>>>> > >>> Looking forward to it. >>>>> > >>> --Pei >>>>> > >>> >>>>> > >>> >>>>> > >>> > -----Original Message----- >>>>> > >>> > From: sandeep rg [mailto:sandeep.f...@gmail.com] >>>>> > >>> > Sent: Monday, July 15, 2013 1:05 PM >>>>> > >>> > To: dev@ctakes.apache.org >>>>> > >>> > Subject: Re: to involve in your development group >>>>> > >>> > >>>>> > >>> > sir, >>>>> > >>> > i gone through most of the ocr technologies and reached a >>>>> > >>>conclusion.i >>>>> > >>> > would like to use apache tika and java ocr for this pupose. >>>>> > >>> > >>>>> > >>> > Tessearact is a ocr tool,it can be used for extracting from >>>>> > >>> > multiple languages.it is implemented in vc++.so it can >>>>>acceded >>>>> > >>> > using java >>>>> > >>>native >>>>> > >>> > function.they provided another tool tess4j but review says >>>>>that >>>>> > >>> > it >>>>> > >>>has >>>>> > >>> > many bugs. >>>>> > >>> > >>>>> > >>> > Apache tika developed in java language.it can be used to >>>>> extract >>>>> > >>> > text >>>>> > >>> data >>>>> > >>> > from .xls,word,txt,pdf and other many formats.it is easy for >>>>> > >>> implementing >>>>> > >>> > in project also.i have just gone through its implementation >>>>>way. >>>>> > >>> > >>>>> > >>> > then about javaocr,its good for extrating text from a jpeg or >>>>> > >>> > scanned images.we can train it with various fonts.more we >>>>>train >>>>> > >>> > more will be >>>>> > >>>its >>>>> > >>> > accuracy but its speed will get decreased.i didn't find any >>>>> > >>>particular >>>>> > >>> > documentation for that. >>>>> > >>> > >>>>> > >>> > >>>>> > >>> > >>>>> > >>> > On Sun, Jul 14, 2013 at 9:18 PM, sandeep rg >>>>> > >>> > <sandeep.f...@gmail.com> >>>>> > >>> > wrote: >>>>> > >>> > >>>>> > >>> > > thanks a lot for both of your support.I will do my best to >>>>> find >>>>> > >>> solution >>>>> > >>> > > for jira problem.i will share the proposal with both of >>>>>you.. >>>>> > >>> > > >>>>> > >>> > > >>>>> > >>> > > >>>>> > >>> > > On Sun, Jul 14, 2013 at 1:46 AM, Chen, Pei >>>>> > >>> > <pei.c...@childrens.harvard.edu >>>>> > >>> > > > wrote: >>>>> > >>> > > >>>>> > >>> > >> Sandeep, >>>>> > >>> > >> Its great to have Chris on board as well- he was one of >>>>>the >>>>> > >>> coordinators >>>>> > >>> > >> of GSoC. >>>>> > >>> > >> Looking forward to it. >>>>> > >>> > >> >>>>> > >>> > >> Sent from my iPhone >>>>> > >>> > >> >>>>> > >>> > >> On Jul 13, 2013, at 12:24 PM, "Mattmann, Chris A (398J)" < >>>>> > >>> > >> chris.a.mattm...@jpl.nasa.gov> wrote: >>>>> > >>> > >> >>>>> > >>> > >> > Hi Sandeep, >>>>> > >>> > >> > >>>>> > >>> > >> > That is great news, and good job. OK, for some ideas >>>>>about >>>>> > >>> developing >>>>> > >>> > >> > your proposal, you may want to simply start with a >>>>>Google >>>>> > >>> > >> > Docs, >>>>> > >>>and >>>>> > >>> > then >>>>> > >>> > >> > share it with Pei. I'd be happy to help co-mentor if Pei >>>>> and >>>>> > >>> > >> > you >>>>> > >>> think >>>>> > >>> > >> > it's useful too. >>>>> > >>> > >> > >>>>> > >>> > >> > Your proposal should likely cover: >>>>> > >>> > >> > >>>>> > >>> > >> > 1. Background - what's the state of CTAKES-189 and >>>>>what's >>>>> it >>>>> > >>> trying to >>>>> > >>> > >> > accomplish >>>>> > >>> > >> > (include some figures, etc. along with your text) >>>>> > >>> > >> > >>>>> > >>> > >> > 2. Approach - what are you going to do to solve >>>>>CTAKES-189. >>>>> > >>> > >> > Be >>>>> > >>> specific, >>>>> > >>> > >> > and >>>>> > >>> > >> > try to break it down into smaller, easily reversible >>>>>steps >>>>> > >>> > >> > >>>>> > >>> > >> > 3. Schedule - how long and what is the schedule for >>>>> achieving >>>>> > >>>this? >>>>> > >>> > >> > >>>>> > >>> > >> > 4. Risks/etc. - any known risks like are you taking a >>>>> > >>> > >> > vacation >>>>> > >>> anytime >>>>> > >>> > >> > soon :) >>>>> > >>> > >> > or are there other time constraints? >>>>> > >>> > >> > >>>>> > >>> > >> > 5. References, etc. >>>>> > >>> > >> > >>>>> > >>> > >> > HTH and I'd be happy if you want to share the GDocs >>>>>with me >>>>> > >>> > >> > as >>>>> > >>>you >>>>> > >>> > >> develop >>>>> > >>> > >> > it. >>>>> > >>> > >> > >>>>> > >>> > >> > Cheers! >>>>> > >>> > >> > >>>>> > >>> > >> > Chris >>>>> > >>> > >> > >>>>> > >>> > >> > >>>>> > >>> > >>>>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>> > >>> > ++++++++ >>>>> > >>> > >> > Chris Mattmann, Ph.D. >>>>> > >>> > >> > Senior Computer Scientist >>>>> > >>> > >> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >>>>> > >>> > >> > Office: 171-266B, Mailstop: 171-246 >>>>> > >>> > >> > Email: chris.a.mattm...@nasa.gov >>>>> > >>> > >> > WWW: http://sunset.usc.edu/~mattmann/ >>>>> > >>> > >> > >>>>> > >>> > >>>>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>> > >>> > ++++++++ >>>>> > >>> > >> > Adjunct Assistant Professor, Computer Science Department >>>>> > >>> > >> > University of Southern California, Los Angeles, CA 90089 >>>>> USA >>>>> > >>> > >> > >>>>> > >>> > >>>>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>> > >>> > ++++++++ >>>>> > >>> > >> > >>>>> > >>> > >> > >>>>> > >>> > >> > >>>>> > >>> > >> > >>>>> > >>> > >> > >>>>> > >>> > >> > >>>>> > >>> > >> > -----Original Message----- >>>>> > >>> > >> > From: sandeep rg <sandeep.f...@gmail.com> >>>>> > >>> > >> > Reply-To: "dev@ctakes.apache.org" >>>>><dev@ctakes.apache.org> >>>>> > >>> > >> > Date: Saturday, July 13, 2013 8:57 AM >>>>> > >>> > >> > To: "dev@ctakes.apache.org" <dev@ctakes.apache.org> >>>>> > >>> > >> > Subject: Re: to involve in your development group >>>>> > >>> > >> > >>>>> > >>> > >> >> i have also gone through the technologies available for >>>>> > >>> development >>>>> > >>> > of >>>>> > >>> > >> >> ocr,from that i think apache tika and tessearact is >>>>>best >>>>> for >>>>> > >>> resolving >>>>> > >>> > >> the >>>>> > >>> > >> >> problem. >>>>> > >>> > >> >> >>>>> > >>> > >> >> >>>>> > >>> > >> >> On Sat, Jul 13, 2013 at 9:02 PM, sandeep rg >>>>> > >>> > <sandeep.f...@gmail.com> >>>>> > >>> > >> >> wrote: >>>>> > >>> > >> >> >>>>> > >>> > >> >>> hi Mattamann Chris, >>>>> > >>> > >> >>> i has participated in the event coordinated by luciano >>>>> > >>> > >> >>> resende >>>>> > >>> > >> >>> >>>>> > >>> > >> >>> http://community.apache.org/mentoringprogramme-icfoss- >>>>> > >>> > pilot.html >>>>> > >>> > >> >>> >>>>> > >>> > >> >>> and from that i learned about open source and like to >>>>> work >>>>> > >>> > >> >>> on >>>>> > >>> your >>>>> > >>> > >> >>> project >>>>> > >>> > >> >>> ctakes.i would like to fix the jira >>>>> > >>> > >> >>> >>>>> > >>> > >> >>> https://issues.apache.org/jira/browse/CTAKES-189 >>>>> > >>> > >> >>> >>>>> > >>> > >> >>> chen pei accepted my requested to be my mentor.now i >>>>>want >>>>> > >>> > >> >>> to >>>>> > >>>give >>>>> > >>> > a >>>>> > >>> > >> >>> proposal to apache about the project i am going to >>>>>work >>>>> > >>> > >> >>> on.can >>>>> > >>> you >>>>> > >>> > >> help >>>>> > >>> > >> >>> me >>>>> > >>> > >> >>> to prepare a proposal to be submitted before 18 th of >>>>> this >>>>> > >>>july. >>>>> > >>> > >> >>> >>>>> > >>> > >> >>> >>>>> > >>> > >> >>> >>>>> > >>> > >> >>> >>>>> > >>> > >> >>> >>>>> > >>> > >> >>> >>>>> > >>> > >> >>> On Sat, Jul 13, 2013 at 2:26 AM, Mattmann, Chris A >>>>> (398J) < >>>>> > >>> > >> >>> chris.a.mattm...@jpl.nasa.gov> wrote: >>>>> > >>> > >> >>> >>>>> > >>> > >> >>>> Hi Sandeep, >>>>> > >>> > >> >>>> >>>>> > >>> > >> >>>> I think the best thing to do is: >>>>> > >>> > >> >>>> >>>>> > >>> > >> >>>> 1. Develop a JIRA issue here: >>>>> > >>> > >> >>>> https://issues.apache.org/jira/browse/CTAKES >>>>> > >>> > >> >>>> 1a. you can register for a new account on JIRA 2. >>>>>Once >>>>> > >>> > >> >>>> your JIRA issue is created, feel free to start a >>>>> > >>> [DISCUSS] >>>>> > >>> > >> >>>> thread >>>>> > >>> > >> >>>> (e.g., with subject [DISCUSS] "some topic" where >>>>>"some >>>>> > >>>topic" is >>>>> > >>> > >> >>>> perhaps >>>>> > >>> > >> >>>> the main idea you have) on dev@ctakes.apache.org, >>>>> > >>> > >> >>>> referencing >>>>> > >>> > your >>>>> > >>> > >> >>>> issue >>>>> > >>> > >> >>>> and >>>>> > >>> > >> >>>> asking for feedback >>>>> > >>> > >> >>>> 3. Work with the Apache cTAKES PMC and committers to >>>>>get >>>>> > >>> > >> >>>> your >>>>> > >>> > patches >>>>> > >>> > >> >>>> and >>>>> > >>> > >> >>>> other items attached to your issue from #1 committed >>>>> into >>>>> > >>> > >> >>>> the >>>>> > >>> > sources >>>>> > >>> > >> >>>> >>>>> > >>> > >> >>>> Ideally if 1-3 happen and it's a good interaction, >>>>> Apache >>>>> > >>> > >> >>>> is >>>>> > >>> built on >>>>> > >>> > >> >>>> meritocracy and you could possibly earn the merit to >>>>> > >>> > >> >>>> become a >>>>> > >>> PMC >>>>> > >>> > >> >>>> member >>>>> > >>> > >> >>>> or committer on the project. >>>>> > >>> > >> >>>> >>>>> > >>> > >> >>>> Cheers, >>>>> > >>> > >> >>>> Chris >>>>> > >>> > >> >>>> >>>>> > >>> > >> >>>> >>>>> > >>> > >>>>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>> > >>> > ++++++++ >>>>> > >>> > >> >>>> Chris Mattmann, Ph.D. >>>>> > >>> > >> >>>> Senior Computer Scientist >>>>> > >>> > >> >>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >>>>> > >>> > >> >>>> Office: 171-266B, Mailstop: 171-246 >>>>> > >>> > >> >>>> Email: chris.a.mattm...@nasa.gov >>>>> > >>> > >> >>>> WWW: http://sunset.usc.edu/~mattmann/ >>>>> > >>> > >> >>>> >>>>> > >>> > >>>>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>> > >>> > ++++++++ >>>>> > >>> > >> >>>> Adjunct Assistant Professor, Computer Science >>>>>Department >>>>> > >>> > >> >>>> University of Southern California, Los Angeles, CA >>>>>90089 >>>>> > >>> > >> >>>> USA >>>>> > >>> > >> >>>> >>>>> > >>> > >>>>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>> > >>> > ++++++++ >>>>> > >>> > >> >>>> >>>>> > >>> > >> >>>> >>>>> > >>> > >> >>>> >>>>> > >>> > >> >>>> >>>>> > >>> > >> >>>> >>>>> > >>> > >> >>>> >>>>> > >>> > >> >>>> -----Original Message----- >>>>> > >>> > >> >>>> From: sandeep rg <sandeep.f...@gmail.com> >>>>> > >>> > >> >>>> Reply-To: "dev@ctakes.apache.org" >>>>> > <dev@ctakes.apache.org> >>>>> > >>> > >> >>>> Date: Thursday, July 11, 2013 11:30 AM >>>>> > >>> > >> >>>> To: "dev@ctakes.apache.org" <dev@ctakes.apache.org> >>>>> > >>> > >> >>>> Subject: Re: to involve in your development group >>>>> > >>> > >> >>>> >>>>> > >>> > >> >>>>> can you provide what all details i should include >>>>>in a >>>>> > >>> > >> >>>> proposal?whether i >>>>> > >>> > >> >>>>> wanted to include all implemetation(technical) >>>>>details >>>>> in >>>>> > >>>the >>>>> > >>> > >> >>>> proposal? >>>>> > >>> > >> >>>>> >>>>> > >>> > >> >>>>> >>>>> > >>> > >> >>>>> On Thu, Jul 11, 2013 at 9:45 PM, Mattmann, Chris A >>>>> (398J) >>>>> > >>> > >> >>>>> < chris.a.mattm...@jpl.nasa.gov> wrote: >>>>> > >>> > >> >>>>> >>>>> > >>> > >> >>>>>> Dear Sandeep, >>>>> > >>> > >> >>>>>> >>>>> > >>> > >> >>>>>> Thanks for your interest in cTAKES. We would >>>>>welcome >>>>> > >>> > >> >>>>>> your >>>>> > >>> > >> >>>> contribution >>>>> > >>> > >> >>>>>> and are happy to have your interest in the project. >>>>> > >>> > >> >>>>>> >>>>> > >>> > >> >>>>>> Cheers, >>>>> > >>> > >> >>>>>> Chris >>>>> > >>> > >> >>>>>> >>>>> > >>> > >> >>>>>> >>>>> > >>> > >>>>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>> > >>> > ++++++++ >>>>> > >>> > >> >>>>>> Chris Mattmann, Ph.D. >>>>> > >>> > >> >>>>>> Senior Computer Scientist NASA Jet Propulsion >>>>> Laboratory >>>>> > >>> > >> >>>>>> Pasadena, CA 91109 USA >>>>> > >>> > >> >>>>>> Office: 171-266B, Mailstop: 171-246 >>>>> > >>> > >> >>>>>> Email: chris.a.mattm...@nasa.gov >>>>> > >>> > >> >>>>>> WWW: http://sunset.usc.edu/~mattmann/ >>>>> > >>> > >> >>>>>> >>>>> > >>> > >>>>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>> > >>> > ++++++++ >>>>> > >>> > >> >>>>>> Adjunct Assistant Professor, Computer Science >>>>> > Department >>>>> > >>> > >> >>>>>> University of Southern California, Los Angeles, CA >>>>> 90089 >>>>> > >>>USA >>>>> > >>> > >> >>>>>> >>>>> > >>> > >>>>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>> > >>> > ++++++++ >>>>> > >>> > >> >>>>>> >>>>> > >>> > >> >>>>>> >>>>> > >>> > >> >>>>>> >>>>> > >>> > >> >>>>>> >>>>> > >>> > >> >>>>>> >>>>> > >>> > >> >>>>>> >>>>> > >>> > >> >>>>>> -----Original Message----- >>>>> > >>> > >> >>>>>> From: sandeep rg <sandeep.f...@gmail.com> >>>>> > >>> > >> >>>>>> Reply-To: "dev@ctakes.apache.org" >>>>> > >>> > >> >>>>>> <dev@ctakes.apache.org> >>>>> > >>> > >> >>>>>> Date: Wednesday, July 10, 2013 11:01 AM >>>>> > >>> > >> >>>>>> To: "dev@ctakes.apache.org" <dev@ctakes.apache.org> >>>>> > >>> > >> >>>>>> Subject: Re: to involve in your development group >>>>> > >>> > >> >>>>>> >>>>> > >>> > >> >>>>>>> sir, >>>>> > >>> > >> >>>>>>> >>>>> > >>> > >> >>>>>>> My name is sandeep rg.i am a btech graduate in >>>>> computer >>>>> > >>> > >> science.now >>>>> > >>> > >> >>>>>> doing >>>>> > >>> > >> >>>>>>> an internship in a company in java language. >>>>> > >>> > >> >>>>>>> >>>>> > >>> > >> >>>>>>> then i had installed all things succesfully,now >>>>> > >>>downloading >>>>> > >>> the >>>>> > >>> > >> >>>>>>> resource.ittake too much time. >>>>> > >>> > >> >>>>>>> >>>>> > >>> > >> >>>>>>> i have gone through the suggested ocr >>>>>technologies. >>>>> > >>> > >> >>>>>>> Javaocr has some good user review. >>>>> > >>> > >> >>>>>>> Apache tika has a capability to process different >>>>> types >>>>> > >>> > >> >>>>>>> of >>>>> > >>> format. >>>>> > >>> > >> >>>>>>> More than that there is tesserract which are also >>>>> used >>>>> > >>> > >> >>>>>>> for >>>>> > >>> ocr >>>>> > >>> > >> >>>> purpose. >>>>> > >>> > >> >>>>>>> then apache pdfbox is also used for text >>>>>extratcion >>>>> but >>>>> > >>>only >>>>> > >>> for >>>>> > >>> > >> >>>> pdf >>>>> > >>> > >> >>>>>>> files. >>>>> > >>> > >> >>>>>>> now i am going through every thing to find out >>>>>best >>>>> > >>> technology >>>>> > >>> > >> from >>>>> > >>> > >> >>>>>> this. >>>>> > >>> > >> >>>>>>> >>>>> > >>> > >> >>>>>>> >>>>> > >>> > >> >>>>>>> On Wed, Jul 10, 2013 at 12:52 AM, Chen, Pei >>>>> > >>> > >> >>>>>>> <pei.c...@childrens.harvard.edu>wrote: >>>>> > >>> > >> >>>>>>> >>>>> > >>> > >> >>>>>>>> Hi Sandeep, >>>>> > >>> > >> >>>>>>>> I am delighted to work with you on this project. >>>>> > >>> > >> >>>>>>>> >>>>> > >>> > >> >>>>>>>> I was not sure if I understood you correctly- did >>>>> you >>>>> > >>>mean >>>>> > >>> to >>>>> > >>> > say >>>>> > >>> > >> >>>>>> that >>>>> > >>> > >> >>>>>>>> you >>>>> > >>> > >> >>>>>>>> have already tried using cTAKES and it's >>>>>components? >>>>> > >>> > >> >>>>>>>> If not, you can do an svn checkout of the code >>>>>and >>>>> try >>>>> > >>> running >>>>> > >>> > >> >>>> the >>>>> > >>> > >> >>>>>>>> debugger gui from the command line (or >>>>>eclipseide) >>>>> > >>> > >> >>>>>>>> that >>>>> > >>>will >>>>> > >>> > >> >>>> allow >>>>> > >>> > >> >>>>>> you >>>>> > >>> > >> >>>>>>>> to >>>>> > >>> > >> >>>>>>>> type in plain text and get back the different >>>>> > >>> > >> >>>>>>>> structured >>>>> > >>> content >>>>> > >>> > >> >>>>>> (types) >>>>> > >>> > >> >>>>>>>> that cTAKES produces: >>>>> > >>> > >> >>>>>>>> MAVEN_OPTS="-Xmx2g -Xms1g" >>>>> > >>> > >> >>>>>>>> mvn -PrunCVD compile >>>>> > >>> > >> >>>>>>>> From the guide: >>>>> > >>> > >> >>>> >>>>> > >>> > >> >>>> >>>>> > >>> > >> >>>>> > >>> > >>>>> > >>> >>>>> https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.0+Devel >>>>> > op >>>>> > >>>e >>>>> > >>> > r+ >>>>> > >>> > >> >>>> I >>>>> > >>> > >> >>>>>>>> nstall+Guide >>>>> > >>> > >> >>>>>>>> >>>>> > >>> > >> >>>>>>>> A bit of background: >>>>> > >>> > >> >>>>>>>> Apache cTAKES uses SVN for version on control: >>>>> > >>> > >> >>>>>>>> https://svn.apache.org/repos/asf/ctakes/trunk/ >>>>> > >>> > >> >>>>>>>> Jira for issues tracking: >>>>> > >>> > >> >>>>>>>> https://issues.apache.org/jira/browse/ctakes >>>>> > >>> > >> >>>>>>>> Maven for building and dependency management. >>>>> > >>> > >> >>>>>>>> A lot of the developers use Eclipse IDE for their >>>>> > >>> development. >>>>> > >>> > >> >>>>>>>> More info on ctakes.apache.org >>>>> > >>> > >> >>>>>>>> >>>>> > >>> > >> >>>>>>>> cTAKES is built on top of the Apache UIMA >>>>>Framework. >>>>> > >>> > >> >>>> Essentially, >>>>> > >>> > >> >>>>>>>> cTAKES >>>>> > >>> > >> >>>>>>>> is a collection of Annotators (Java Classes) and >>>>> wired >>>>> > >>> together >>>>> > >>> > >> >>>> to >>>>> > >>> > >> >>>>>> into >>>>> > >>> > >> >>>>>>>> a >>>>> > >>> > >> >>>>>>>> pipeline. >>>>> > >>> > >> >>>>>>>> It's goal in a nutshell is to turn unstructured >>>>> plain >>>>> > >>>text >>>>> > >>> into >>>>> > >>> > >> >>>>>>>> structured/normalized form and specially trained >>>>>for >>>>> > >>>medical >>>>> > >>> > >> >>>> notes. >>>>> > >>> > >> >>>>>>>> Right now- the input cTAKES expects would be in >>>>> plain >>>>> > >>>text >>>>> > >>> > form >>>>> > >>> > >> >>>> and >>>>> > >>> > >> >>>>>>>> cTAKES >>>>> > >>> > >> >>>>>>>> does not have an OCR component. >>>>> > >>> > >> >>>>>>>> cTAKE-189:GSoC:implement OCR/tika to standardize >>>>> text >>>>> > >>> > inputs was >>>>> > >>> > >> >>>> an >>>>> > >>> > >> >>>>>> idea >>>>> > >>> > >> >>>>>>>> to allow cTAKES to take in any type of input >>>>>(PDF, >>>>> > >>>Images, >>>>> > >>> > Word, >>>>> > >>> > >> >>>> XLS, >>>>> > >>> > >> >>>>>>>> etc.) >>>>> > >>> > >> >>>>>>>> and pass the text for cTAKES processing. >>>>> > >>> > >> >>>>>>>> [I was originally thinking this could be done in >>>>> some >>>>> > >>>kind >>>>> > >>> of >>>>> > >>> > >> >>>>>>>> preprocessing, or an optional Annotator that >>>>>could >>>>> be >>>>> > >>>added >>>>> > >>> in >>>>> > >>> > >> >>>> the >>>>> > >>> > >> >>>>>>>> beginning of a pipeline]. There may be some >>>>> existing >>>>> > >>>work >>>>> > >>> > that >>>>> > >>> > >> >>>>>> could be >>>>> > >>> > >> >>>>>>>> potentially reused: Apache Tika ( >>>>> > >>> > >> >>>>>>>> https://issues.apache.org/jira/browse/TIKA-93 ) >>>>>as >>>>> > >>> > >> >>>>>>>> well >>>>> > >>>as >>>>> > >>> > some >>>>> > >>> > >> >>>> open >>>>> > >>> > >> >>>>>>>> source OCR toolkits (JavaOCR). >>>>> > >>> > >> >>>>>>>> >>>>> > >>> > >> >>>>>>>> About Me: >>>>> > >>> > >> >>>> >>>>> > >>> > >> >>>> >>>>> > >>> > >> >>>>> > >>> > >>>>> > >>> >>>>> > >>> >>>>> http://childrenshospital.org/cfapps/research/data_admin/Site3240/main >>>>> > >>>pag >>>>> > >>> > >> >>>> e >>>>> > >>> > >> >>>>>>>> S3240P8.html >>>>> > >>> > >> >>>>>>>> http://www.linkedin.com/in/peistation >>>>> > >>> > >> >>>>>>>> http://people.apache.org/committer- >>>>> > index.html#chenpei >>>>> > >>> > >> >>>>>>>> >>>>> > >>> > >> >>>>>>>>> -----Original Message----- >>>>> > >>> > >> >>>>>>>>> From: sandeep rg [mailto:sandeep.f...@gmail.com] >>>>> > >>> > >> >>>>>>>>> Sent: Tuesday, July 09, 2013 1:19 PM >>>>> > >>> > >> >>>>>>>>> To: dev@ctakes.apache.org >>>>> > >>> > >> >>>>>>>>> Subject: Re: to involve in your development >>>>>group >>>>> > >>> > >> >>>>>>>>> >>>>> > >>> > >> >>>>>>>>> Thanks a lot for giving me support.i like to >>>>>work >>>>> > >>> > >> >>>>>>>>> with >>>>> > >>>you. >>>>> > >>> > >> >>>>>>>>> >>>>> > >>> > >> >>>>>>>>> I have gone through the objectives of the >>>>> > >>> > >> >>>>>>>>> software,used >>>>> > >>>the >>>>> > >>> > >> >>>>>> software >>>>> > >>> > >> >>>>>>>> and >>>>> > >>> > >> >>>>>>>>> gone through various components of the >>>>>project.can >>>>> > >>> > >> >>>>>>>>> you >>>>> > >>> > provide >>>>> > >>> > >> >>>> me >>>>> > >>> > >> >>>>>>>> starting >>>>> > >>> > >> >>>>>>>>> point from where i should start to know more >>>>>about >>>>> > >>> > >> >>>>>>>>> the >>>>> > >>> > coding >>>>> > >>> > >> >>>> part >>>>> > >>> > >> >>>>>> of >>>>> > >>> > >> >>>>>>>> the >>>>> > >>> > >> >>>>>>>>> project. >>>>> > >>> > >> >>>>>>>>> >>>>> > >>> > >> >>>>>>>>> can you tell me more about the project and about >>>>> you >>>>> > >>>also? >>>>> > >>> > >> >>>>>>>>> >>>>> > >>> > >> >>>>>>>>> >>>>> > >>> > >> >>>>>>>>> On Tue, Jul 9, 2013 at 1:14 AM, Chen, Pei >>>>> > >>> > >> >>>>>>>>> <pei.c...@childrens.harvard.edu>wrote: >>>>> > >>> > >> >>>>>>>>> >>>>> > >>> > >> >>>>>>>>>> Hi Sandeep, >>>>> > >>> > >> >>>>>>>>>> Thank you for the interest. I just had a quick >>>>> look >>>>> > >>> > >> >>>>>>>>>> at >>>>> > >>> the >>>>> > >>> > >> >>>>>> ICFOSS >>>>> > >>> > >> >>>>>>>>>> pilot mentoring program and will be happy to >>>>>serve >>>>> > >>> > >> >>>>>>>>>> as a >>>>> > >>> > >> >>>> mentor >>>>> > >>> > >> >>>>>> for >>>>> > >>> > >> >>>>>>>>>> your project >>>>> > >>> > >> >>>>>>>>>> proposal(s) if you are interested. >>>>> > >>> > >> >>>>>>>>>> >>>>> > >>> > >> >>>>>>>>>> --Pei >>>>> > >>> > >> >>>>>>>>>> >>>>> > >>> > >> >>>>>>>>>>> -----Original Message----- >>>>> > >>> > >> >>>>>>>>>>> From: sandeep rg >>>>>[mailto:sandeep.f...@gmail.com] >>>>> > >>> > >> >>>>>>>>>>> Sent: Monday, July 08, 2013 2:24 PM >>>>> > >>> > >> >>>>>>>>>>> To: dev@ctakes.apache.org >>>>> > >>> > >> >>>>>>>>>>> Subject: Re: to involve in your development >>>>>group >>>>> > >>> > >> >>>>>>>>>>> >>>>> > >>> > >> >>>>>>>>>>> sir, >>>>> > >>> > >> >>>>>>>>>>> >>>>> > >>> > >> >>>>>>>>>>> details of the program Pilot mentoring >>>>>programme >>>>> > >>> > >> >>>>>>>>>>> with >>>>> > >>> > india >>>>> > >>> > >> >>>>>> ICFOSS >>>>> > >>> > >> >>>>>>>>>>> is >>>>> > >>> > >> >>>>>>>>>> given >>>>> > >>> > >> >>>>>>>>>>> in the below web address >>>>> > >>> > >> >>>>>> http://community.apache.org/mentoringprogramme- >>>>> > icfoss- >>>>> > >>> > pilot.html >>>>> > >>> > >> >>>>>>>>>>> >>>>> > >>> > >> >>>>>>>>>>> >>>>> > >>> > >> >>>>>>>>>>> I am new to this community so i need a mentor >>>>>for >>>>> > >>> > >> >>>>>>>>>>> the >>>>> > >>> > >> >>>>>> project.It >>>>> > >>> > >> >>>>>>>>>>> will be >>>>> > >>> > >> >>>>>>>>>> more >>>>> > >>> > >> >>>>>>>>>>> helpful for me.. >>>>> > >>> > >> >>>>>>>>>>> >>>>> > >>> > >> >>>>>>>>>>> >>>>> > >>> > >> >>>>>>>>>>> On Mon, Jul 8, 2013 at 7:22 PM, Chen, Pei >>>>> > >>> > >> >>>>>>>>>>> <pei.c...@childrens.harvard.edu>wrote: >>>>> > >>> > >> >>>>>>>>>>> >>>>> > >>> > >> >>>>>>>>>>>> Hi Sandeep, >>>>> > >>> > >> >>>>>>>>>>>> Welcome! I am not familiar with the details >>>>>of >>>>> > >>> > >> >>>>>> icfoss-apache, >>>>> > >>> > >> >>>>>>>> but >>>>> > >>> > >> >>>>>>>>>>>> please- you are more than welcome to work on >>>>>the >>>>> > >>> > >> >>>>>>>>>>>> code >>>>> > >>> > and >>>>> > >>> > >> >>>>>>>>>>>> contributions will be greatly appreciated! >>>>> > >>> > >> >>>>>>>>>>>> There may be a learning curve, but feel free >>>>>let >>>>> > >>> > >> >>>>>>>>>>>> us >>>>> > >>>know >>>>> > >>> > >> >>>> if >>>>> > >>> > >> >>>>>> you >>>>> > >>> > >> >>>>>>>>>>>> have any questions/issues. >>>>> > >>> > >> >>>>>>>>>>>> Thanks, >>>>> > >>> > >> >>>>>>>>>>>> Pei >>>>> > >>> > >> >>>>>>>>>>>> >>>>> > >>> > >> >>>>>>>>>>>>> -----Original Message----- >>>>> > >>> > >> >>>>>>>>>>>>> From: sandeep rg >>>>> > [mailto:sandeep.f...@gmail.com] >>>>> > >>> > >> >>>>>>>>>>>>> Sent: Saturday, July 06, 2013 11:50 AM >>>>> > >>> > >> >>>>>>>>>>>>> To: dev@ctakes.apache.org >>>>> > >>> > >> >>>>>>>>>>>>> Subject: to involve in your development >>>>>group >>>>> > >>> > >> >>>>>>>>>>>>> >>>>> > >>> > >> >>>>>>>>>>>>> my name is sandeep.i am btech graduate.i had >>>>> > >>> > >> >>>> participated >>>>> > >>> > >> >>>>>> in >>>>> > >>> > >> >>>>>>>> a >>>>> > >>> > >> >>>>>>>>>>>>> camp coordinated in kerala,India in >>>>>association >>>>> > >>> > >> >>>>>>>>>>>>> with icfoss-apache called as >>>>> > >>> > >> >>>>>>>>>>>> youth >>>>> > >>> > >> >>>>>>>>>>>>> mentoring programme coordinated by Luciano >>>>> > resende. >>>>> > >>> > >> >>>>>>>>>>>>> >>>>> > >>> > >> >>>>>>>>>>>>> i >>>>>like >>>>> the >>>>> > >>> > >> >>>> project >>>>> > >>> > >> >>>>>> and >>>>> > >>> > >> >>>>>>>>>>>>> like to >>>>> > >>> > >> >>>>>>>>>>>> involve in your project as a >>>>> > >>> > >> >>>>>>>>>>>>> programmer.i have gone through the your >>>>>project >>>>> > >>> > >> >>>>>>>>>>>>> and >>>>> > >>> > >> >>>> gone >>>>> > >>> > >> >>>>>>>> through >>>>> > >>> > >> >>>>>>>>>>>>> the bugs list.I like to work on the bug >>>>> > >>> > >> >>>>>>>>>>>>> "cTAKE-189:GSoC:implement OCR/tika to >>>>> > standardize >>>>> > >>> > text >>>>> > >>> > >> >>>>>> inputs >>>>> > >>> > >> >>>>>>>>>>>>> for cTAKES".can you allow me to >>>>> > >>> > >> >>>>>>>>>> work >>>>> > >>> > >> >>>>>>>>>>> on that? >>>>> > >>> > >> > >>>>> > >>> > >> >>>>> > >>> > > >>>>> > >>> > > >>>>> > >>> >>>>> > >> >>>>> > >> >>>>> >>>>> >>>> >>> >>