thank you Finan sean, for your suggestion,i am now just going through the JAI,i think it has more features then javaocr..
On Mon, Jul 22, 2013 at 10:22 PM, Mattmann, Chris A (398J) < chris.a.mattm...@jpl.nasa.gov> wrote: > Hi Sandeep, > > I'll try and review this today. > > Cheers, > Chris > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Chris Mattmann, Ph.D. > Senior Computer Scientist > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > Office: 171-266B, Mailstop: 171-246 > Email: chris.a.mattm...@nasa.gov > WWW: http://sunset.usc.edu/~mattmann/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Adjunct Assistant Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > > > -----Original Message----- > From: sandeep rg <sandeep.f...@gmail.com> > Reply-To: "dev@ctakes.apache.org" <dev@ctakes.apache.org> > Date: Monday, July 22, 2013 7:04 AM > To: "dev@ctakes.apache.org" <dev@ctakes.apache.org> > Subject: Re: to involve in your development group > > >sir, > > i have gone through some of the medical record such as bills,patient > >details etc. most of them are printed using dot matrix printer,which is > >very hard to extract such type text from scanned images.i have done > >testing > >with some professional software such as abbyy fine reader which also given > >a poor output. > > > >but sir i have the confidence to do it.but i need more knowledge about > >image processing capabilities.so can you suggest any one who is good in > >image processing programming in your team? > > > > > >On Thu, Jul 18, 2013 at 1:22 AM, sandeep rg <sandeep.f...@gmail.com> > >wrote: > > > >> i hava done sequence diagram and done some small changes,please go > >>through > >> it and tell me if any more thing is to be included > >> > >> > >> On Wed, Jul 17, 2013 at 9:37 PM, sandeep rg > >><sandeep.f...@gmail.com>wrote: > >> > >>> it just a skeleton of original proposal > >>> > >>> > >>> On Wed, Jul 17, 2013 at 9:31 PM, sandeep rg > >>><sandeep.f...@gmail.com>wrote: > >>> > >>>> the sample work is shared with you both.any more details to be > >>>>included > >>>> please tell me. > >>>> In which,GUI design,schedule and implementation flow chart design is > >>>>to > >>>> added which is under construction and will be uploaded within few > >>>>hours. > >>>> > >>>> > >>>> On Wed, Jul 17, 2013 at 7:56 PM, Chen, Pei < > >>>> pei.c...@childrens.harvard.edu> wrote: > >>>> > >>>>> pei.stat...@gmail.com > >>>>> > >>>>> > -----Original Message----- > >>>>> > From: Mattmann, Chris A (398J) > >>>>>[mailto:chris.a.mattm...@jpl.nasa.gov] > >>>>> > Sent: Wednesday, July 17, 2013 10:22 AM > >>>>> > To: dev@ctakes.apache.org > >>>>> > Subject: Re: to involve in your development group > >>>>> > > >>>>> > chris.mattm...@gmail.com > >>>>> > > >>>>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >>>>> > ++++++++ > >>>>> > Chris Mattmann, Ph.D. > >>>>> > Senior Computer Scientist > >>>>> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > >>>>> > Office: 171-266B, Mailstop: 171-246 > >>>>> > Email: chris.a.mattm...@nasa.gov > >>>>> > WWW: http://sunset.usc.edu/~mattmann/ > >>>>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >>>>> > ++++++++ > >>>>> > Adjunct Assistant Professor, Computer Science Department > >>>>>University of > >>>>> > Southern California, Los Angeles, CA 90089 USA > >>>>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >>>>> > ++++++++ > >>>>> > > >>>>> > > >>>>> > > >>>>> > > >>>>> > > >>>>> > > >>>>> > -----Original Message----- > >>>>> > From: sandeep rg <sandeep.f...@gmail.com> > >>>>> > Reply-To: "dev@ctakes.apache.org" <dev@ctakes.apache.org> > >>>>> > Date: Wednesday, July 17, 2013 6:53 AM > >>>>> > To: "dev@ctakes.apache.org" <dev@ctakes.apache.org> > >>>>> > Subject: Re: to involve in your development group > >>>>> > > >>>>> > >can you provide your gmail id to share the proposal document with > >>>>> you? > >>>>> > > > >>>>> > > > >>>>> > > > >>>>> > >On Tue, Jul 16, 2013 at 11:33 PM, sandeep rg > >>>>><sandeep.f...@gmail.com > >>>>> > > >>>>> > >wrote: > >>>>> > > > >>>>> > >> sir, > >>>>> > >> i am providing proposal by two days.now i am mainly going > >>>>>through > >>>>> > >>ASF-ICFOSS gateway because if i gone through their way and my > >>>>> proposal > >>>>> > >>is get selected,ICFOSS will provide some sort of support such as > >>>>> > >>certificates,small financial support etc. to us. > >>>>> > >> > >>>>> > >> > >>>>> > >> but,main thing is i like programming,i like to explore through > >>>>>the > >>>>> > >> new technologies in coding and like to interact with the > >>>>>coding.so > >>>>> if > >>>>> > >> my proposal is got rejected,then also i like to work in your > >>>>> project > >>>>> > >> as a volunteer if you allow me.. > >>>>> > >> > >>>>> > >> now i am preparing a proposal,within 2 days i will submit > >>>>> > >> it..Mattmann chris helped me to know more about the format of > >>>>> > proposal. > >>>>> > >> > >>>>> > >> > >>>>> > >> On Tue, Jul 16, 2013 at 8:12 PM, Chen, Pei > >>>>> > >><pei.c...@childrens.harvard.edu > >>>>> > >> > wrote: > >>>>> > >> > >>>>> > >>> Chris/Sandeep, > >>>>> > >>> According to ASF-ICFOSS, I believe the deadline for submitting > >>>>> > >>>proposals is this coming Friday (July 19). > >>>>> > >>> After which point, mentors will have 2 weeks to review and > >>>>> > >>>score/accept. > >>>>> > >>> Just curious, are we planning to follow the same process here? > >>>>> Or > >>>>> > >>>since it's all volunteer work, technically- sandeep and still > >>>>> > >>>contribute code to the community and participate in the dev > >>>>>group > >>>>> > >>>here. > >>>>> > >>> > >>>>> > >>> Looking forward to it. > >>>>> > >>> --Pei > >>>>> > >>> > >>>>> > >>> > >>>>> > >>> > -----Original Message----- > >>>>> > >>> > From: sandeep rg [mailto:sandeep.f...@gmail.com] > >>>>> > >>> > Sent: Monday, July 15, 2013 1:05 PM > >>>>> > >>> > To: dev@ctakes.apache.org > >>>>> > >>> > Subject: Re: to involve in your development group > >>>>> > >>> > > >>>>> > >>> > sir, > >>>>> > >>> > i gone through most of the ocr technologies and reached a > >>>>> > >>>conclusion.i > >>>>> > >>> > would like to use apache tika and java ocr for this pupose. > >>>>> > >>> > > >>>>> > >>> > Tessearact is a ocr tool,it can be used for extracting from > >>>>> > >>> > multiple languages.it is implemented in vc++.so it can > >>>>>acceded > >>>>> > >>> > using java > >>>>> > >>>native > >>>>> > >>> > function.they provided another tool tess4j but review says > >>>>>that > >>>>> > >>> > it > >>>>> > >>>has > >>>>> > >>> > many bugs. > >>>>> > >>> > > >>>>> > >>> > Apache tika developed in java language.it can be used to > >>>>> extract > >>>>> > >>> > text > >>>>> > >>> data > >>>>> > >>> > from .xls,word,txt,pdf and other many formats.it is easy for > >>>>> > >>> implementing > >>>>> > >>> > in project also.i have just gone through its implementation > >>>>>way. > >>>>> > >>> > > >>>>> > >>> > then about javaocr,its good for extrating text from a jpeg or > >>>>> > >>> > scanned images.we can train it with various fonts.more we > >>>>>train > >>>>> > >>> > more will be > >>>>> > >>>its > >>>>> > >>> > accuracy but its speed will get decreased.i didn't find any > >>>>> > >>>particular > >>>>> > >>> > documentation for that. > >>>>> > >>> > > >>>>> > >>> > > >>>>> > >>> > > >>>>> > >>> > On Sun, Jul 14, 2013 at 9:18 PM, sandeep rg > >>>>> > >>> > <sandeep.f...@gmail.com> > >>>>> > >>> > wrote: > >>>>> > >>> > > >>>>> > >>> > > thanks a lot for both of your support.I will do my best to > >>>>> find > >>>>> > >>> solution > >>>>> > >>> > > for jira problem.i will share the proposal with both of > >>>>>you.. > >>>>> > >>> > > > >>>>> > >>> > > > >>>>> > >>> > > > >>>>> > >>> > > On Sun, Jul 14, 2013 at 1:46 AM, Chen, Pei > >>>>> > >>> > <pei.c...@childrens.harvard.edu > >>>>> > >>> > > > wrote: > >>>>> > >>> > > > >>>>> > >>> > >> Sandeep, > >>>>> > >>> > >> Its great to have Chris on board as well- he was one of > >>>>>the > >>>>> > >>> coordinators > >>>>> > >>> > >> of GSoC. > >>>>> > >>> > >> Looking forward to it. > >>>>> > >>> > >> > >>>>> > >>> > >> Sent from my iPhone > >>>>> > >>> > >> > >>>>> > >>> > >> On Jul 13, 2013, at 12:24 PM, "Mattmann, Chris A (398J)" < > >>>>> > >>> > >> chris.a.mattm...@jpl.nasa.gov> wrote: > >>>>> > >>> > >> > >>>>> > >>> > >> > Hi Sandeep, > >>>>> > >>> > >> > > >>>>> > >>> > >> > That is great news, and good job. OK, for some ideas > >>>>>about > >>>>> > >>> developing > >>>>> > >>> > >> > your proposal, you may want to simply start with a > >>>>>Google > >>>>> > >>> > >> > Docs, > >>>>> > >>>and > >>>>> > >>> > then > >>>>> > >>> > >> > share it with Pei. I'd be happy to help co-mentor if Pei > >>>>> and > >>>>> > >>> > >> > you > >>>>> > >>> think > >>>>> > >>> > >> > it's useful too. > >>>>> > >>> > >> > > >>>>> > >>> > >> > Your proposal should likely cover: > >>>>> > >>> > >> > > >>>>> > >>> > >> > 1. Background - what's the state of CTAKES-189 and > >>>>>what's > >>>>> it > >>>>> > >>> trying to > >>>>> > >>> > >> > accomplish > >>>>> > >>> > >> > (include some figures, etc. along with your text) > >>>>> > >>> > >> > > >>>>> > >>> > >> > 2. Approach - what are you going to do to solve > >>>>>CTAKES-189. > >>>>> > >>> > >> > Be > >>>>> > >>> specific, > >>>>> > >>> > >> > and > >>>>> > >>> > >> > try to break it down into smaller, easily reversible > >>>>>steps > >>>>> > >>> > >> > > >>>>> > >>> > >> > 3. Schedule - how long and what is the schedule for > >>>>> achieving > >>>>> > >>>this? > >>>>> > >>> > >> > > >>>>> > >>> > >> > 4. Risks/etc. - any known risks like are you taking a > >>>>> > >>> > >> > vacation > >>>>> > >>> anytime > >>>>> > >>> > >> > soon :) > >>>>> > >>> > >> > or are there other time constraints? > >>>>> > >>> > >> > > >>>>> > >>> > >> > 5. References, etc. > >>>>> > >>> > >> > > >>>>> > >>> > >> > HTH and I'd be happy if you want to share the GDocs > >>>>>with me > >>>>> > >>> > >> > as > >>>>> > >>>you > >>>>> > >>> > >> develop > >>>>> > >>> > >> > it. > >>>>> > >>> > >> > > >>>>> > >>> > >> > Cheers! > >>>>> > >>> > >> > > >>>>> > >>> > >> > Chris > >>>>> > >>> > >> > > >>>>> > >>> > >> > > >>>>> > >>> > > >>>>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >>>>> > >>> > ++++++++ > >>>>> > >>> > >> > Chris Mattmann, Ph.D. > >>>>> > >>> > >> > Senior Computer Scientist > >>>>> > >>> > >> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > >>>>> > >>> > >> > Office: 171-266B, Mailstop: 171-246 > >>>>> > >>> > >> > Email: chris.a.mattm...@nasa.gov > >>>>> > >>> > >> > WWW: http://sunset.usc.edu/~mattmann/ > >>>>> > >>> > >> > > >>>>> > >>> > > >>>>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >>>>> > >>> > ++++++++ > >>>>> > >>> > >> > Adjunct Assistant Professor, Computer Science Department > >>>>> > >>> > >> > University of Southern California, Los Angeles, CA 90089 > >>>>> USA > >>>>> > >>> > >> > > >>>>> > >>> > > >>>>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >>>>> > >>> > ++++++++ > >>>>> > >>> > >> > > >>>>> > >>> > >> > > >>>>> > >>> > >> > > >>>>> > >>> > >> > > >>>>> > >>> > >> > > >>>>> > >>> > >> > > >>>>> > >>> > >> > -----Original Message----- > >>>>> > >>> > >> > From: sandeep rg <sandeep.f...@gmail.com> > >>>>> > >>> > >> > Reply-To: "dev@ctakes.apache.org" > >>>>><dev@ctakes.apache.org> > >>>>> > >>> > >> > Date: Saturday, July 13, 2013 8:57 AM > >>>>> > >>> > >> > To: "dev@ctakes.apache.org" <dev@ctakes.apache.org> > >>>>> > >>> > >> > Subject: Re: to involve in your development group > >>>>> > >>> > >> > > >>>>> > >>> > >> >> i have also gone through the technologies available for > >>>>> > >>> development > >>>>> > >>> > of > >>>>> > >>> > >> >> ocr,from that i think apache tika and tessearact is > >>>>>best > >>>>> for > >>>>> > >>> resolving > >>>>> > >>> > >> the > >>>>> > >>> > >> >> problem. > >>>>> > >>> > >> >> > >>>>> > >>> > >> >> > >>>>> > >>> > >> >> On Sat, Jul 13, 2013 at 9:02 PM, sandeep rg > >>>>> > >>> > <sandeep.f...@gmail.com> > >>>>> > >>> > >> >> wrote: > >>>>> > >>> > >> >> > >>>>> > >>> > >> >>> hi Mattamann Chris, > >>>>> > >>> > >> >>> i has participated in the event coordinated by luciano > >>>>> > >>> > >> >>> resende > >>>>> > >>> > >> >>> > >>>>> > >>> > >> >>> > http://community.apache.org/mentoringprogramme-icfoss- > >>>>> > >>> > pilot.html > >>>>> > >>> > >> >>> > >>>>> > >>> > >> >>> and from that i learned about open source and like to > >>>>> work > >>>>> > >>> > >> >>> on > >>>>> > >>> your > >>>>> > >>> > >> >>> project > >>>>> > >>> > >> >>> ctakes.i would like to fix the jira > >>>>> > >>> > >> >>> > >>>>> > >>> > >> >>> https://issues.apache.org/jira/browse/CTAKES-189 > >>>>> > >>> > >> >>> > >>>>> > >>> > >> >>> chen pei accepted my requested to be my mentor.now i > >>>>>want > >>>>> > >>> > >> >>> to > >>>>> > >>>give > >>>>> > >>> > a > >>>>> > >>> > >> >>> proposal to apache about the project i am going to > >>>>>work > >>>>> > >>> > >> >>> on.can > >>>>> > >>> you > >>>>> > >>> > >> help > >>>>> > >>> > >> >>> me > >>>>> > >>> > >> >>> to prepare a proposal to be submitted before 18 th of > >>>>> this > >>>>> > >>>july. > >>>>> > >>> > >> >>> > >>>>> > >>> > >> >>> > >>>>> > >>> > >> >>> > >>>>> > >>> > >> >>> > >>>>> > >>> > >> >>> > >>>>> > >>> > >> >>> > >>>>> > >>> > >> >>> On Sat, Jul 13, 2013 at 2:26 AM, Mattmann, Chris A > >>>>> (398J) < > >>>>> > >>> > >> >>> chris.a.mattm...@jpl.nasa.gov> wrote: > >>>>> > >>> > >> >>> > >>>>> > >>> > >> >>>> Hi Sandeep, > >>>>> > >>> > >> >>>> > >>>>> > >>> > >> >>>> I think the best thing to do is: > >>>>> > >>> > >> >>>> > >>>>> > >>> > >> >>>> 1. Develop a JIRA issue here: > >>>>> > >>> > >> >>>> https://issues.apache.org/jira/browse/CTAKES > >>>>> > >>> > >> >>>> 1a. you can register for a new account on JIRA 2. > >>>>>Once > >>>>> > >>> > >> >>>> your JIRA issue is created, feel free to start a > >>>>> > >>> [DISCUSS] > >>>>> > >>> > >> >>>> thread > >>>>> > >>> > >> >>>> (e.g., with subject [DISCUSS] "some topic" where > >>>>>"some > >>>>> > >>>topic" is > >>>>> > >>> > >> >>>> perhaps > >>>>> > >>> > >> >>>> the main idea you have) on dev@ctakes.apache.org, > >>>>> > >>> > >> >>>> referencing > >>>>> > >>> > your > >>>>> > >>> > >> >>>> issue > >>>>> > >>> > >> >>>> and > >>>>> > >>> > >> >>>> asking for feedback > >>>>> > >>> > >> >>>> 3. Work with the Apache cTAKES PMC and committers to > >>>>>get > >>>>> > >>> > >> >>>> your > >>>>> > >>> > patches > >>>>> > >>> > >> >>>> and > >>>>> > >>> > >> >>>> other items attached to your issue from #1 committed > >>>>> into > >>>>> > >>> > >> >>>> the > >>>>> > >>> > sources > >>>>> > >>> > >> >>>> > >>>>> > >>> > >> >>>> Ideally if 1-3 happen and it's a good interaction, > >>>>> Apache > >>>>> > >>> > >> >>>> is > >>>>> > >>> built on > >>>>> > >>> > >> >>>> meritocracy and you could possibly earn the merit to > >>>>> > >>> > >> >>>> become a > >>>>> > >>> PMC > >>>>> > >>> > >> >>>> member > >>>>> > >>> > >> >>>> or committer on the project. > >>>>> > >>> > >> >>>> > >>>>> > >>> > >> >>>> Cheers, > >>>>> > >>> > >> >>>> Chris > >>>>> > >>> > >> >>>> > >>>>> > >>> > >> >>>> > >>>>> > >>> > > >>>>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >>>>> > >>> > ++++++++ > >>>>> > >>> > >> >>>> Chris Mattmann, Ph.D. > >>>>> > >>> > >> >>>> Senior Computer Scientist > >>>>> > >>> > >> >>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > >>>>> > >>> > >> >>>> Office: 171-266B, Mailstop: 171-246 > >>>>> > >>> > >> >>>> Email: chris.a.mattm...@nasa.gov > >>>>> > >>> > >> >>>> WWW: http://sunset.usc.edu/~mattmann/ > >>>>> > >>> > >> >>>> > >>>>> > >>> > > >>>>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >>>>> > >>> > ++++++++ > >>>>> > >>> > >> >>>> Adjunct Assistant Professor, Computer Science > >>>>>Department > >>>>> > >>> > >> >>>> University of Southern California, Los Angeles, CA > >>>>>90089 > >>>>> > >>> > >> >>>> USA > >>>>> > >>> > >> >>>> > >>>>> > >>> > > >>>>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >>>>> > >>> > ++++++++ > >>>>> > >>> > >> >>>> > >>>>> > >>> > >> >>>> > >>>>> > >>> > >> >>>> > >>>>> > >>> > >> >>>> > >>>>> > >>> > >> >>>> > >>>>> > >>> > >> >>>> > >>>>> > >>> > >> >>>> -----Original Message----- > >>>>> > >>> > >> >>>> From: sandeep rg <sandeep.f...@gmail.com> > >>>>> > >>> > >> >>>> Reply-To: "dev@ctakes.apache.org" > >>>>> > <dev@ctakes.apache.org> > >>>>> > >>> > >> >>>> Date: Thursday, July 11, 2013 11:30 AM > >>>>> > >>> > >> >>>> To: "dev@ctakes.apache.org" <dev@ctakes.apache.org> > >>>>> > >>> > >> >>>> Subject: Re: to involve in your development group > >>>>> > >>> > >> >>>> > >>>>> > >>> > >> >>>>> can you provide what all details i should include > >>>>>in a > >>>>> > >>> > >> >>>> proposal?whether i > >>>>> > >>> > >> >>>>> wanted to include all implemetation(technical) > >>>>>details > >>>>> in > >>>>> > >>>the > >>>>> > >>> > >> >>>> proposal? > >>>>> > >>> > >> >>>>> > >>>>> > >>> > >> >>>>> > >>>>> > >>> > >> >>>>> On Thu, Jul 11, 2013 at 9:45 PM, Mattmann, Chris A > >>>>> (398J) > >>>>> > >>> > >> >>>>> < chris.a.mattm...@jpl.nasa.gov> wrote: > >>>>> > >>> > >> >>>>> > >>>>> > >>> > >> >>>>>> Dear Sandeep, > >>>>> > >>> > >> >>>>>> > >>>>> > >>> > >> >>>>>> Thanks for your interest in cTAKES. We would > >>>>>welcome > >>>>> > >>> > >> >>>>>> your > >>>>> > >>> > >> >>>> contribution > >>>>> > >>> > >> >>>>>> and are happy to have your interest in the project. > >>>>> > >>> > >> >>>>>> > >>>>> > >>> > >> >>>>>> Cheers, > >>>>> > >>> > >> >>>>>> Chris > >>>>> > >>> > >> >>>>>> > >>>>> > >>> > >> >>>>>> > >>>>> > >>> > > >>>>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >>>>> > >>> > ++++++++ > >>>>> > >>> > >> >>>>>> Chris Mattmann, Ph.D. > >>>>> > >>> > >> >>>>>> Senior Computer Scientist NASA Jet Propulsion > >>>>> Laboratory > >>>>> > >>> > >> >>>>>> Pasadena, CA 91109 USA > >>>>> > >>> > >> >>>>>> Office: 171-266B, Mailstop: 171-246 > >>>>> > >>> > >> >>>>>> Email: chris.a.mattm...@nasa.gov > >>>>> > >>> > >> >>>>>> WWW: http://sunset.usc.edu/~mattmann/ > >>>>> > >>> > >> >>>>>> > >>>>> > >>> > > >>>>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >>>>> > >>> > ++++++++ > >>>>> > >>> > >> >>>>>> Adjunct Assistant Professor, Computer Science > >>>>> > Department > >>>>> > >>> > >> >>>>>> University of Southern California, Los Angeles, CA > >>>>> 90089 > >>>>> > >>>USA > >>>>> > >>> > >> >>>>>> > >>>>> > >>> > > >>>>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >>>>> > >>> > ++++++++ > >>>>> > >>> > >> >>>>>> > >>>>> > >>> > >> >>>>>> > >>>>> > >>> > >> >>>>>> > >>>>> > >>> > >> >>>>>> > >>>>> > >>> > >> >>>>>> > >>>>> > >>> > >> >>>>>> > >>>>> > >>> > >> >>>>>> -----Original Message----- > >>>>> > >>> > >> >>>>>> From: sandeep rg <sandeep.f...@gmail.com> > >>>>> > >>> > >> >>>>>> Reply-To: "dev@ctakes.apache.org" > >>>>> > >>> > >> >>>>>> <dev@ctakes.apache.org> > >>>>> > >>> > >> >>>>>> Date: Wednesday, July 10, 2013 11:01 AM > >>>>> > >>> > >> >>>>>> To: "dev@ctakes.apache.org" <dev@ctakes.apache.org > > > >>>>> > >>> > >> >>>>>> Subject: Re: to involve in your development group > >>>>> > >>> > >> >>>>>> > >>>>> > >>> > >> >>>>>>> sir, > >>>>> > >>> > >> >>>>>>> > >>>>> > >>> > >> >>>>>>> My name is sandeep rg.i am a btech graduate in > >>>>> computer > >>>>> > >>> > >> science.now > >>>>> > >>> > >> >>>>>> doing > >>>>> > >>> > >> >>>>>>> an internship in a company in java language. > >>>>> > >>> > >> >>>>>>> > >>>>> > >>> > >> >>>>>>> then i had installed all things succesfully,now > >>>>> > >>>downloading > >>>>> > >>> the > >>>>> > >>> > >> >>>>>>> resource.ittake too much time. > >>>>> > >>> > >> >>>>>>> > >>>>> > >>> > >> >>>>>>> i have gone through the suggested ocr > >>>>>technologies. > >>>>> > >>> > >> >>>>>>> Javaocr has some good user review. > >>>>> > >>> > >> >>>>>>> Apache tika has a capability to process different > >>>>> types > >>>>> > >>> > >> >>>>>>> of > >>>>> > >>> format. > >>>>> > >>> > >> >>>>>>> More than that there is tesserract which are also > >>>>> used > >>>>> > >>> > >> >>>>>>> for > >>>>> > >>> ocr > >>>>> > >>> > >> >>>> purpose. > >>>>> > >>> > >> >>>>>>> then apache pdfbox is also used for text > >>>>>extratcion > >>>>> but > >>>>> > >>>only > >>>>> > >>> for > >>>>> > >>> > >> >>>> pdf > >>>>> > >>> > >> >>>>>>> files. > >>>>> > >>> > >> >>>>>>> now i am going through every thing to find out > >>>>>best > >>>>> > >>> technology > >>>>> > >>> > >> from > >>>>> > >>> > >> >>>>>> this. > >>>>> > >>> > >> >>>>>>> > >>>>> > >>> > >> >>>>>>> > >>>>> > >>> > >> >>>>>>> On Wed, Jul 10, 2013 at 12:52 AM, Chen, Pei > >>>>> > >>> > >> >>>>>>> <pei.c...@childrens.harvard.edu>wrote: > >>>>> > >>> > >> >>>>>>> > >>>>> > >>> > >> >>>>>>>> Hi Sandeep, > >>>>> > >>> > >> >>>>>>>> I am delighted to work with you on this project. > >>>>> > >>> > >> >>>>>>>> > >>>>> > >>> > >> >>>>>>>> I was not sure if I understood you correctly- did > >>>>> you > >>>>> > >>>mean > >>>>> > >>> to > >>>>> > >>> > say > >>>>> > >>> > >> >>>>>> that > >>>>> > >>> > >> >>>>>>>> you > >>>>> > >>> > >> >>>>>>>> have already tried using cTAKES and it's > >>>>>components? > >>>>> > >>> > >> >>>>>>>> If not, you can do an svn checkout of the code > >>>>>and > >>>>> try > >>>>> > >>> running > >>>>> > >>> > >> >>>> the > >>>>> > >>> > >> >>>>>>>> debugger gui from the command line (or > >>>>>eclipseide) > >>>>> > >>> > >> >>>>>>>> that > >>>>> > >>>will > >>>>> > >>> > >> >>>> allow > >>>>> > >>> > >> >>>>>> you > >>>>> > >>> > >> >>>>>>>> to > >>>>> > >>> > >> >>>>>>>> type in plain text and get back the different > >>>>> > >>> > >> >>>>>>>> structured > >>>>> > >>> content > >>>>> > >>> > >> >>>>>> (types) > >>>>> > >>> > >> >>>>>>>> that cTAKES produces: > >>>>> > >>> > >> >>>>>>>> MAVEN_OPTS="-Xmx2g -Xms1g" > >>>>> > >>> > >> >>>>>>>> mvn -PrunCVD compile > >>>>> > >>> > >> >>>>>>>> From the guide: > >>>>> > >>> > >> >>>> > >>>>> > >>> > >> >>>> > >>>>> > >>> > >> > >>>>> > >>> > > >>>>> > >>> > >>>>> https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.0+Devel > >>>>> > op > >>>>> > >>>e > >>>>> > >>> > r+ > >>>>> > >>> > >> >>>> I > >>>>> > >>> > >> >>>>>>>> nstall+Guide > >>>>> > >>> > >> >>>>>>>> > >>>>> > >>> > >> >>>>>>>> A bit of background: > >>>>> > >>> > >> >>>>>>>> Apache cTAKES uses SVN for version on control: > >>>>> > >>> > >> >>>>>>>> https://svn.apache.org/repos/asf/ctakes/trunk/ > >>>>> > >>> > >> >>>>>>>> Jira for issues tracking: > >>>>> > >>> > >> >>>>>>>> https://issues.apache.org/jira/browse/ctakes > >>>>> > >>> > >> >>>>>>>> Maven for building and dependency management. > >>>>> > >>> > >> >>>>>>>> A lot of the developers use Eclipse IDE for their > >>>>> > >>> development. > >>>>> > >>> > >> >>>>>>>> More info on ctakes.apache.org > >>>>> > >>> > >> >>>>>>>> > >>>>> > >>> > >> >>>>>>>> cTAKES is built on top of the Apache UIMA > >>>>>Framework. > >>>>> > >>> > >> >>>> Essentially, > >>>>> > >>> > >> >>>>>>>> cTAKES > >>>>> > >>> > >> >>>>>>>> is a collection of Annotators (Java Classes) and > >>>>> wired > >>>>> > >>> together > >>>>> > >>> > >> >>>> to > >>>>> > >>> > >> >>>>>> into > >>>>> > >>> > >> >>>>>>>> a > >>>>> > >>> > >> >>>>>>>> pipeline. > >>>>> > >>> > >> >>>>>>>> It's goal in a nutshell is to turn unstructured > >>>>> plain > >>>>> > >>>text > >>>>> > >>> into > >>>>> > >>> > >> >>>>>>>> structured/normalized form and specially trained > >>>>>for > >>>>> > >>>medical > >>>>> > >>> > >> >>>> notes. > >>>>> > >>> > >> >>>>>>>> Right now- the input cTAKES expects would be in > >>>>> plain > >>>>> > >>>text > >>>>> > >>> > form > >>>>> > >>> > >> >>>> and > >>>>> > >>> > >> >>>>>>>> cTAKES > >>>>> > >>> > >> >>>>>>>> does not have an OCR component. > >>>>> > >>> > >> >>>>>>>> cTAKE-189:GSoC:implement OCR/tika to standardize > >>>>> text > >>>>> > >>> > inputs was > >>>>> > >>> > >> >>>> an > >>>>> > >>> > >> >>>>>> idea > >>>>> > >>> > >> >>>>>>>> to allow cTAKES to take in any type of input > >>>>>(PDF, > >>>>> > >>>Images, > >>>>> > >>> > Word, > >>>>> > >>> > >> >>>> XLS, > >>>>> > >>> > >> >>>>>>>> etc.) > >>>>> > >>> > >> >>>>>>>> and pass the text for cTAKES processing. > >>>>> > >>> > >> >>>>>>>> [I was originally thinking this could be done in > >>>>> some > >>>>> > >>>kind > >>>>> > >>> of > >>>>> > >>> > >> >>>>>>>> preprocessing, or an optional Annotator that > >>>>>could > >>>>> be > >>>>> > >>>added > >>>>> > >>> in > >>>>> > >>> > >> >>>> the > >>>>> > >>> > >> >>>>>>>> beginning of a pipeline]. There may be some > >>>>> existing > >>>>> > >>>work > >>>>> > >>> > that > >>>>> > >>> > >> >>>>>> could be > >>>>> > >>> > >> >>>>>>>> potentially reused: Apache Tika ( > >>>>> > >>> > >> >>>>>>>> https://issues.apache.org/jira/browse/TIKA-93 ) > >>>>>as > >>>>> > >>> > >> >>>>>>>> well > >>>>> > >>>as > >>>>> > >>> > some > >>>>> > >>> > >> >>>> open > >>>>> > >>> > >> >>>>>>>> source OCR toolkits (JavaOCR). > >>>>> > >>> > >> >>>>>>>> > >>>>> > >>> > >> >>>>>>>> About Me: > >>>>> > >>> > >> >>>> > >>>>> > >>> > >> >>>> > >>>>> > >>> > >> > >>>>> > >>> > > >>>>> > >>> > >>>>> > >>> > >>>>> > http://childrenshospital.org/cfapps/research/data_admin/Site3240/main > >>>>> > >>>pag > >>>>> > >>> > >> >>>> e > >>>>> > >>> > >> >>>>>>>> S3240P8.html > >>>>> > >>> > >> >>>>>>>> http://www.linkedin.com/in/peistation > >>>>> > >>> > >> >>>>>>>> http://people.apache.org/committer- > >>>>> > index.html#chenpei > >>>>> > >>> > >> >>>>>>>> > >>>>> > >>> > >> >>>>>>>>> -----Original Message----- > >>>>> > >>> > >> >>>>>>>>> From: sandeep rg [mailto:sandeep.f...@gmail.com > ] > >>>>> > >>> > >> >>>>>>>>> Sent: Tuesday, July 09, 2013 1:19 PM > >>>>> > >>> > >> >>>>>>>>> To: dev@ctakes.apache.org > >>>>> > >>> > >> >>>>>>>>> Subject: Re: to involve in your development > >>>>>group > >>>>> > >>> > >> >>>>>>>>> > >>>>> > >>> > >> >>>>>>>>> Thanks a lot for giving me support.i like to > >>>>>work > >>>>> > >>> > >> >>>>>>>>> with > >>>>> > >>>you. > >>>>> > >>> > >> >>>>>>>>> > >>>>> > >>> > >> >>>>>>>>> I have gone through the objectives of the > >>>>> > >>> > >> >>>>>>>>> software,used > >>>>> > >>>the > >>>>> > >>> > >> >>>>>> software > >>>>> > >>> > >> >>>>>>>> and > >>>>> > >>> > >> >>>>>>>>> gone through various components of the > >>>>>project.can > >>>>> > >>> > >> >>>>>>>>> you > >>>>> > >>> > provide > >>>>> > >>> > >> >>>> me > >>>>> > >>> > >> >>>>>>>> starting > >>>>> > >>> > >> >>>>>>>>> point from where i should start to know more > >>>>>about > >>>>> > >>> > >> >>>>>>>>> the > >>>>> > >>> > coding > >>>>> > >>> > >> >>>> part > >>>>> > >>> > >> >>>>>> of > >>>>> > >>> > >> >>>>>>>> the > >>>>> > >>> > >> >>>>>>>>> project. > >>>>> > >>> > >> >>>>>>>>> > >>>>> > >>> > >> >>>>>>>>> can you tell me more about the project and about > >>>>> you > >>>>> > >>>also? > >>>>> > >>> > >> >>>>>>>>> > >>>>> > >>> > >> >>>>>>>>> > >>>>> > >>> > >> >>>>>>>>> On Tue, Jul 9, 2013 at 1:14 AM, Chen, Pei > >>>>> > >>> > >> >>>>>>>>> <pei.c...@childrens.harvard.edu>wrote: > >>>>> > >>> > >> >>>>>>>>> > >>>>> > >>> > >> >>>>>>>>>> Hi Sandeep, > >>>>> > >>> > >> >>>>>>>>>> Thank you for the interest. I just had a quick > >>>>> look > >>>>> > >>> > >> >>>>>>>>>> at > >>>>> > >>> the > >>>>> > >>> > >> >>>>>> ICFOSS > >>>>> > >>> > >> >>>>>>>>>> pilot mentoring program and will be happy to > >>>>>serve > >>>>> > >>> > >> >>>>>>>>>> as a > >>>>> > >>> > >> >>>> mentor > >>>>> > >>> > >> >>>>>> for > >>>>> > >>> > >> >>>>>>>>>> your project > >>>>> > >>> > >> >>>>>>>>>> proposal(s) if you are interested. > >>>>> > >>> > >> >>>>>>>>>> > >>>>> > >>> > >> >>>>>>>>>> --Pei > >>>>> > >>> > >> >>>>>>>>>> > >>>>> > >>> > >> >>>>>>>>>>> -----Original Message----- > >>>>> > >>> > >> >>>>>>>>>>> From: sandeep rg > >>>>>[mailto:sandeep.f...@gmail.com] > >>>>> > >>> > >> >>>>>>>>>>> Sent: Monday, July 08, 2013 2:24 PM > >>>>> > >>> > >> >>>>>>>>>>> To: dev@ctakes.apache.org > >>>>> > >>> > >> >>>>>>>>>>> Subject: Re: to involve in your development > >>>>>group > >>>>> > >>> > >> >>>>>>>>>>> > >>>>> > >>> > >> >>>>>>>>>>> sir, > >>>>> > >>> > >> >>>>>>>>>>> > >>>>> > >>> > >> >>>>>>>>>>> details of the program Pilot mentoring > >>>>>programme > >>>>> > >>> > >> >>>>>>>>>>> with > >>>>> > >>> > india > >>>>> > >>> > >> >>>>>> ICFOSS > >>>>> > >>> > >> >>>>>>>>>>> is > >>>>> > >>> > >> >>>>>>>>>> given > >>>>> > >>> > >> >>>>>>>>>>> in the below web address > >>>>> > >>> > >> >>>>>> http://community.apache.org/mentoringprogramme- > >>>>> > icfoss- > >>>>> > >>> > pilot.html > >>>>> > >>> > >> >>>>>>>>>>> > >>>>> > >>> > >> >>>>>>>>>>> > >>>>> > >>> > >> >>>>>>>>>>> I am new to this community so i need a mentor > >>>>>for > >>>>> > >>> > >> >>>>>>>>>>> the > >>>>> > >>> > >> >>>>>> project.It > >>>>> > >>> > >> >>>>>>>>>>> will be > >>>>> > >>> > >> >>>>>>>>>> more > >>>>> > >>> > >> >>>>>>>>>>> helpful for me.. > >>>>> > >>> > >> >>>>>>>>>>> > >>>>> > >>> > >> >>>>>>>>>>> > >>>>> > >>> > >> >>>>>>>>>>> On Mon, Jul 8, 2013 at 7:22 PM, Chen, Pei > >>>>> > >>> > >> >>>>>>>>>>> <pei.c...@childrens.harvard.edu>wrote: > >>>>> > >>> > >> >>>>>>>>>>> > >>>>> > >>> > >> >>>>>>>>>>>> Hi Sandeep, > >>>>> > >>> > >> >>>>>>>>>>>> Welcome! I am not familiar with the details > >>>>>of > >>>>> > >>> > >> >>>>>> icfoss-apache, > >>>>> > >>> > >> >>>>>>>> but > >>>>> > >>> > >> >>>>>>>>>>>> please- you are more than welcome to work on > >>>>>the > >>>>> > >>> > >> >>>>>>>>>>>> code > >>>>> > >>> > and > >>>>> > >>> > >> >>>>>>>>>>>> contributions will be greatly appreciated! > >>>>> > >>> > >> >>>>>>>>>>>> There may be a learning curve, but feel free > >>>>>let > >>>>> > >>> > >> >>>>>>>>>>>> us > >>>>> > >>>know > >>>>> > >>> > >> >>>> if > >>>>> > >>> > >> >>>>>> you > >>>>> > >>> > >> >>>>>>>>>>>> have any questions/issues. > >>>>> > >>> > >> >>>>>>>>>>>> Thanks, > >>>>> > >>> > >> >>>>>>>>>>>> Pei > >>>>> > >>> > >> >>>>>>>>>>>> > >>>>> > >>> > >> >>>>>>>>>>>>> -----Original Message----- > >>>>> > >>> > >> >>>>>>>>>>>>> From: sandeep rg > >>>>> > [mailto:sandeep.f...@gmail.com] > >>>>> > >>> > >> >>>>>>>>>>>>> Sent: Saturday, July 06, 2013 11:50 AM > >>>>> > >>> > >> >>>>>>>>>>>>> To: dev@ctakes.apache.org > >>>>> > >>> > >> >>>>>>>>>>>>> Subject: to involve in your development > >>>>>group > >>>>> > >>> > >> >>>>>>>>>>>>> > >>>>> > >>> > >> >>>>>>>>>>>>> my name is sandeep.i am btech graduate.i had > >>>>> > >>> > >> >>>> participated > >>>>> > >>> > >> >>>>>> in > >>>>> > >>> > >> >>>>>>>> a > >>>>> > >>> > >> >>>>>>>>>>>>> camp coordinated in kerala,India in > >>>>>association > >>>>> > >>> > >> >>>>>>>>>>>>> with icfoss-apache called as > >>>>> > >>> > >> >>>>>>>>>>>> youth > >>>>> > >>> > >> >>>>>>>>>>>>> mentoring programme coordinated by Luciano > >>>>> > resende. > >>>>> > >>> > >> >>>>>>>>>>>>> > >>>>> > >>> > >> >>>>>>>>>>>>> i > >>>>>like > >>>>> the > >>>>> > >>> > >> >>>> project > >>>>> > >>> > >> >>>>>> and > >>>>> > >>> > >> >>>>>>>>>>>>> like to > >>>>> > >>> > >> >>>>>>>>>>>> involve in your project as a > >>>>> > >>> > >> >>>>>>>>>>>>> programmer.i have gone through the your > >>>>>project > >>>>> > >>> > >> >>>>>>>>>>>>> and > >>>>> > >>> > >> >>>> gone > >>>>> > >>> > >> >>>>>>>> through > >>>>> > >>> > >> >>>>>>>>>>>>> the bugs list.I like to work on the bug > >>>>> > >>> > >> >>>>>>>>>>>>> "cTAKE-189:GSoC:implement OCR/tika to > >>>>> > standardize > >>>>> > >>> > text > >>>>> > >>> > >> >>>>>> inputs > >>>>> > >>> > >> >>>>>>>>>>>>> for cTAKES".can you allow me to > >>>>> > >>> > >> >>>>>>>>>> work > >>>>> > >>> > >> >>>>>>>>>>> on that? > >>>>> > >>> > >> > > >>>>> > >>> > >> > >>>>> > >>> > > > >>>>> > >>> > > > >>>>> > >>> > >>>>> > >> > >>>>> > >> > >>>>> > >>>>> > >>>> > >>> > >> > >