Chris/Sandeep, According to ASF-ICFOSS, I believe the deadline for submitting proposals is this coming Friday (July 19). After which point, mentors will have 2 weeks to review and score/accept. Just curious, are we planning to follow the same process here? Or since it's all volunteer work, technically- sandeep and still contribute code to the community and participate in the dev group here.
Looking forward to it. --Pei > -----Original Message----- > From: sandeep rg [mailto:[email protected]] > Sent: Monday, July 15, 2013 1:05 PM > To: [email protected] > Subject: Re: to involve in your development group > > sir, > i gone through most of the ocr technologies and reached a conclusion.i > would like to use apache tika and java ocr for this pupose. > > Tessearact is a ocr tool,it can be used for extracting from multiple > languages.it is implemented in vc++.so it can acceded using java native > function.they provided another tool tess4j but review says that it has > many bugs. > > Apache tika developed in java language.it can be used to extract text data > from .xls,word,txt,pdf and other many formats.it is easy for implementing > in project also.i have just gone through its implementation way. > > then about javaocr,its good for extrating text from a jpeg or scanned > images.we can train it with various fonts.more we train more will be its > accuracy but its speed will get decreased.i didn't find any particular > documentation for that. > > > > On Sun, Jul 14, 2013 at 9:18 PM, sandeep rg <[email protected]> > wrote: > > > thanks a lot for both of your support.I will do my best to find solution > > for jira problem.i will share the proposal with both of you.. > > > > > > > > On Sun, Jul 14, 2013 at 1:46 AM, Chen, Pei > <[email protected] > > > wrote: > > > >> Sandeep, > >> Its great to have Chris on board as well- he was one of the coordinators > >> of GSoC. > >> Looking forward to it. > >> > >> Sent from my iPhone > >> > >> On Jul 13, 2013, at 12:24 PM, "Mattmann, Chris A (398J)" < > >> [email protected]> wrote: > >> > >> > Hi Sandeep, > >> > > >> > That is great news, and good job. OK, for some ideas about developing > >> > your proposal, you may want to simply start with a Google Docs, and > then > >> > share it with Pei. I'd be happy to help co-mentor if Pei and you think > >> > it's useful too. > >> > > >> > Your proposal should likely cover: > >> > > >> > 1. Background - what's the state of CTAKES-189 and what's it trying to > >> > accomplish > >> > (include some figures, etc. along with your text) > >> > > >> > 2. Approach - what are you going to do to solve CTAKES-189. Be specific, > >> > and > >> > try to break it down into smaller, easily reversible steps > >> > > >> > 3. Schedule - how long and what is the schedule for achieving this? > >> > > >> > 4. Risks/etc. - any known risks like are you taking a vacation anytime > >> > soon :) > >> > or are there other time constraints? > >> > > >> > 5. References, etc. > >> > > >> > HTH and I'd be happy if you want to share the GDocs with me as you > >> develop > >> > it. > >> > > >> > Cheers! > >> > > >> > Chris > >> > > >> > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > ++++++++ > >> > Chris Mattmann, Ph.D. > >> > Senior Computer Scientist > >> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > >> > Office: 171-266B, Mailstop: 171-246 > >> > Email: [email protected] > >> > WWW: http://sunset.usc.edu/~mattmann/ > >> > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > ++++++++ > >> > Adjunct Assistant Professor, Computer Science Department > >> > University of Southern California, Los Angeles, CA 90089 USA > >> > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > ++++++++ > >> > > >> > > >> > > >> > > >> > > >> > > >> > -----Original Message----- > >> > From: sandeep rg <[email protected]> > >> > Reply-To: "[email protected]" <[email protected]> > >> > Date: Saturday, July 13, 2013 8:57 AM > >> > To: "[email protected]" <[email protected]> > >> > Subject: Re: to involve in your development group > >> > > >> >> i have also gone through the technologies available for development > of > >> >> ocr,from that i think apache tika and tessearact is best for resolving > >> the > >> >> problem. > >> >> > >> >> > >> >> On Sat, Jul 13, 2013 at 9:02 PM, sandeep rg > <[email protected]> > >> >> wrote: > >> >> > >> >>> hi Mattamann Chris, > >> >>> i has participated in the event coordinated by luciano resende > >> >>> > >> >>> http://community.apache.org/mentoringprogramme-icfoss- > pilot.html > >> >>> > >> >>> and from that i learned about open source and like to work on your > >> >>> project > >> >>> ctakes.i would like to fix the jira > >> >>> > >> >>> https://issues.apache.org/jira/browse/CTAKES-189 > >> >>> > >> >>> chen pei accepted my requested to be my mentor.now i want to give > a > >> >>> proposal to apache about the project i am going to work on.can you > >> help > >> >>> me > >> >>> to prepare a proposal to be submitted before 18 th of this july. > >> >>> > >> >>> > >> >>> > >> >>> > >> >>> > >> >>> > >> >>> On Sat, Jul 13, 2013 at 2:26 AM, Mattmann, Chris A (398J) < > >> >>> [email protected]> wrote: > >> >>> > >> >>>> Hi Sandeep, > >> >>>> > >> >>>> I think the best thing to do is: > >> >>>> > >> >>>> 1. Develop a JIRA issue here: > >> >>>> https://issues.apache.org/jira/browse/CTAKES > >> >>>> 1a. you can register for a new account on JIRA > >> >>>> 2. Once your JIRA issue is created, feel free to start a [DISCUSS] > >> >>>> thread > >> >>>> (e.g., with subject [DISCUSS] "some topic" where "some topic" is > >> >>>> perhaps > >> >>>> the main idea you have) on [email protected], referencing > your > >> >>>> issue > >> >>>> and > >> >>>> asking for feedback > >> >>>> 3. Work with the Apache cTAKES PMC and committers to get your > patches > >> >>>> and > >> >>>> other items attached to your issue from #1 committed into the > sources > >> >>>> > >> >>>> Ideally if 1-3 happen and it's a good interaction, Apache is built on > >> >>>> meritocracy and you could possibly earn the merit to become a PMC > >> >>>> member > >> >>>> or committer on the project. > >> >>>> > >> >>>> Cheers, > >> >>>> Chris > >> >>>> > >> >>>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > ++++++++ > >> >>>> Chris Mattmann, Ph.D. > >> >>>> Senior Computer Scientist > >> >>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > >> >>>> Office: 171-266B, Mailstop: 171-246 > >> >>>> Email: [email protected] > >> >>>> WWW: http://sunset.usc.edu/~mattmann/ > >> >>>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > ++++++++ > >> >>>> Adjunct Assistant Professor, Computer Science Department > >> >>>> University of Southern California, Los Angeles, CA 90089 USA > >> >>>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > ++++++++ > >> >>>> > >> >>>> > >> >>>> > >> >>>> > >> >>>> > >> >>>> > >> >>>> -----Original Message----- > >> >>>> From: sandeep rg <[email protected]> > >> >>>> Reply-To: "[email protected]" <[email protected]> > >> >>>> Date: Thursday, July 11, 2013 11:30 AM > >> >>>> To: "[email protected]" <[email protected]> > >> >>>> Subject: Re: to involve in your development group > >> >>>> > >> >>>>> can you provide what all details i should include in a > >> >>>> proposal?whether i > >> >>>>> wanted to include all implemetation(technical) details in the > >> >>>> proposal? > >> >>>>> > >> >>>>> > >> >>>>> On Thu, Jul 11, 2013 at 9:45 PM, Mattmann, Chris A (398J) < > >> >>>>> [email protected]> wrote: > >> >>>>> > >> >>>>>> Dear Sandeep, > >> >>>>>> > >> >>>>>> Thanks for your interest in cTAKES. We would welcome your > >> >>>> contribution > >> >>>>>> and are happy to have your interest in the project. > >> >>>>>> > >> >>>>>> Cheers, > >> >>>>>> Chris > >> >>>>>> > >> >>>>>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > ++++++++ > >> >>>>>> Chris Mattmann, Ph.D. > >> >>>>>> Senior Computer Scientist > >> >>>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > >> >>>>>> Office: 171-266B, Mailstop: 171-246 > >> >>>>>> Email: [email protected] > >> >>>>>> WWW: http://sunset.usc.edu/~mattmann/ > >> >>>>>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > ++++++++ > >> >>>>>> Adjunct Assistant Professor, Computer Science Department > >> >>>>>> University of Southern California, Los Angeles, CA 90089 USA > >> >>>>>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > ++++++++ > >> >>>>>> > >> >>>>>> > >> >>>>>> > >> >>>>>> > >> >>>>>> > >> >>>>>> > >> >>>>>> -----Original Message----- > >> >>>>>> From: sandeep rg <[email protected]> > >> >>>>>> Reply-To: "[email protected]" <[email protected]> > >> >>>>>> Date: Wednesday, July 10, 2013 11:01 AM > >> >>>>>> To: "[email protected]" <[email protected]> > >> >>>>>> Subject: Re: to involve in your development group > >> >>>>>> > >> >>>>>>> sir, > >> >>>>>>> > >> >>>>>>> My name is sandeep rg.i am a btech graduate in computer > >> science.now > >> >>>>>> doing > >> >>>>>>> an internship in a company in java language. > >> >>>>>>> > >> >>>>>>> then i had installed all things succesfully,now downloading the > >> >>>>>>> resource.ittake too much time. > >> >>>>>>> > >> >>>>>>> i have gone through the suggested ocr technologies. > >> >>>>>>> Javaocr has some good user review. > >> >>>>>>> Apache tika has a capability to process different types of format. > >> >>>>>>> More than that there is tesserract which are also used for ocr > >> >>>> purpose. > >> >>>>>>> then apache pdfbox is also used for text extratcion but only for > >> >>>> pdf > >> >>>>>>> files. > >> >>>>>>> now i am going through every thing to find out best technology > >> from > >> >>>>>> this. > >> >>>>>>> > >> >>>>>>> > >> >>>>>>> On Wed, Jul 10, 2013 at 12:52 AM, Chen, Pei > >> >>>>>>> <[email protected]>wrote: > >> >>>>>>> > >> >>>>>>>> Hi Sandeep, > >> >>>>>>>> I am delighted to work with you on this project. > >> >>>>>>>> > >> >>>>>>>> I was not sure if I understood you correctly- did you mean to > say > >> >>>>>> that > >> >>>>>>>> you > >> >>>>>>>> have already tried using cTAKES and it's components? > >> >>>>>>>> If not, you can do an svn checkout of the code and try running > >> >>>> the > >> >>>>>>>> debugger gui from the command line (or eclipseide) that will > >> >>>> allow > >> >>>>>> you > >> >>>>>>>> to > >> >>>>>>>> type in plain text and get back the different structured content > >> >>>>>> (types) > >> >>>>>>>> that cTAKES produces: > >> >>>>>>>> MAVEN_OPTS="-Xmx2g -Xms1g" > >> >>>>>>>> mvn -PrunCVD compile > >> >>>>>>>> From the guide: > >> >>>> > >> >>>> > >> > https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.0+Develope > r+ > >> >>>> I > >> >>>>>>>> nstall+Guide > >> >>>>>>>> > >> >>>>>>>> A bit of background: > >> >>>>>>>> Apache cTAKES uses SVN for version on control: > >> >>>>>>>> https://svn.apache.org/repos/asf/ctakes/trunk/ > >> >>>>>>>> Jira for issues tracking: > >> >>>>>>>> https://issues.apache.org/jira/browse/ctakes > >> >>>>>>>> Maven for building and dependency management. > >> >>>>>>>> A lot of the developers use Eclipse IDE for their development. > >> >>>>>>>> More info on ctakes.apache.org > >> >>>>>>>> > >> >>>>>>>> cTAKES is built on top of the Apache UIMA Framework. > >> >>>> Essentially, > >> >>>>>>>> cTAKES > >> >>>>>>>> is a collection of Annotators (Java Classes) and wired together > >> >>>> to > >> >>>>>> into > >> >>>>>>>> a > >> >>>>>>>> pipeline. > >> >>>>>>>> It's goal in a nutshell is to turn unstructured plain text into > >> >>>>>>>> structured/normalized form and specially trained for medical > >> >>>> notes. > >> >>>>>>>> Right now- the input cTAKES expects would be in plain text > form > >> >>>> and > >> >>>>>>>> cTAKES > >> >>>>>>>> does not have an OCR component. > >> >>>>>>>> cTAKE-189:GSoC:implement OCR/tika to standardize text > inputs was > >> >>>> an > >> >>>>>> idea > >> >>>>>>>> to allow cTAKES to take in any type of input (PDF, Images, > Word, > >> >>>> XLS, > >> >>>>>>>> etc.) > >> >>>>>>>> and pass the text for cTAKES processing. > >> >>>>>>>> [I was originally thinking this could be done in some kind of > >> >>>>>>>> preprocessing, or an optional Annotator that could be added in > >> >>>> the > >> >>>>>>>> beginning of a pipeline]. There may be some existing work > that > >> >>>>>> could be > >> >>>>>>>> potentially reused: Apache Tika ( > >> >>>>>>>> https://issues.apache.org/jira/browse/TIKA-93 ) as well as > some > >> >>>> open > >> >>>>>>>> source OCR toolkits (JavaOCR). > >> >>>>>>>> > >> >>>>>>>> About Me: > >> >>>> > >> >>>> > >> > http://childrenshospital.org/cfapps/research/data_admin/Site3240/mainpag > >> >>>> e > >> >>>>>>>> S3240P8.html > >> >>>>>>>> http://www.linkedin.com/in/peistation > >> >>>>>>>> http://people.apache.org/committer-index.html#chenpei > >> >>>>>>>> > >> >>>>>>>>> -----Original Message----- > >> >>>>>>>>> From: sandeep rg [mailto:[email protected]] > >> >>>>>>>>> Sent: Tuesday, July 09, 2013 1:19 PM > >> >>>>>>>>> To: [email protected] > >> >>>>>>>>> Subject: Re: to involve in your development group > >> >>>>>>>>> > >> >>>>>>>>> Thanks a lot for giving me support.i like to work with you. > >> >>>>>>>>> > >> >>>>>>>>> I have gone through the objectives of the software,used the > >> >>>>>> software > >> >>>>>>>> and > >> >>>>>>>>> gone through various components of the project.can you > provide > >> >>>> me > >> >>>>>>>> starting > >> >>>>>>>>> point from where i should start to know more about the > coding > >> >>>> part > >> >>>>>> of > >> >>>>>>>> the > >> >>>>>>>>> project. > >> >>>>>>>>> > >> >>>>>>>>> can you tell me more about the project and about you also? > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> On Tue, Jul 9, 2013 at 1:14 AM, Chen, Pei > >> >>>>>>>>> <[email protected]>wrote: > >> >>>>>>>>> > >> >>>>>>>>>> Hi Sandeep, > >> >>>>>>>>>> Thank you for the interest. I just had a quick look at the > >> >>>>>> ICFOSS > >> >>>>>>>>>> pilot mentoring program and will be happy to serve as a > >> >>>> mentor > >> >>>>>> for > >> >>>>>>>>>> your project > >> >>>>>>>>>> proposal(s) if you are interested. > >> >>>>>>>>>> > >> >>>>>>>>>> --Pei > >> >>>>>>>>>> > >> >>>>>>>>>>> -----Original Message----- > >> >>>>>>>>>>> From: sandeep rg [mailto:[email protected]] > >> >>>>>>>>>>> Sent: Monday, July 08, 2013 2:24 PM > >> >>>>>>>>>>> To: [email protected] > >> >>>>>>>>>>> Subject: Re: to involve in your development group > >> >>>>>>>>>>> > >> >>>>>>>>>>> sir, > >> >>>>>>>>>>> > >> >>>>>>>>>>> details of the program Pilot mentoring programme with > india > >> >>>>>> ICFOSS > >> >>>>>>>>>>> is > >> >>>>>>>>>> given > >> >>>>>>>>>>> in the below web address > >> >>>>>> http://community.apache.org/mentoringprogramme-icfoss- > pilot.html > >> >>>>>>>>>>> > >> >>>>>>>>>>> > >> >>>>>>>>>>> I am new to this community so i need a mentor for the > >> >>>>>> project.It > >> >>>>>>>>>>> will be > >> >>>>>>>>>> more > >> >>>>>>>>>>> helpful for me.. > >> >>>>>>>>>>> > >> >>>>>>>>>>> > >> >>>>>>>>>>> On Mon, Jul 8, 2013 at 7:22 PM, Chen, Pei > >> >>>>>>>>>>> <[email protected]>wrote: > >> >>>>>>>>>>> > >> >>>>>>>>>>>> Hi Sandeep, > >> >>>>>>>>>>>> Welcome! I am not familiar with the details of > >> >>>>>> icfoss-apache, > >> >>>>>>>> but > >> >>>>>>>>>>>> please- you are more than welcome to work on the code > and > >> >>>>>>>>>>>> contributions will be greatly appreciated! > >> >>>>>>>>>>>> There may be a learning curve, but feel free let us know > >> >>>> if > >> >>>>>> you > >> >>>>>>>>>>>> have any questions/issues. > >> >>>>>>>>>>>> Thanks, > >> >>>>>>>>>>>> Pei > >> >>>>>>>>>>>> > >> >>>>>>>>>>>>> -----Original Message----- > >> >>>>>>>>>>>>> From: sandeep rg [mailto:[email protected]] > >> >>>>>>>>>>>>> Sent: Saturday, July 06, 2013 11:50 AM > >> >>>>>>>>>>>>> To: [email protected] > >> >>>>>>>>>>>>> Subject: to involve in your development group > >> >>>>>>>>>>>>> > >> >>>>>>>>>>>>> my name is sandeep.i am btech graduate.i had > >> >>>> participated > >> >>>>>> in > >> >>>>>>>> a > >> >>>>>>>>>>>>> camp coordinated in kerala,India in association with > >> >>>>>>>>>>>>> icfoss-apache called as > >> >>>>>>>>>>>> youth > >> >>>>>>>>>>>>> mentoring programme coordinated by Luciano resende. > >> >>>>>>>>>>>>> > >> >>>>>>>>>>>>> i like the > >> >>>> project > >> >>>>>> and > >> >>>>>>>>>>>>> like to > >> >>>>>>>>>>>> involve in your project as a > >> >>>>>>>>>>>>> programmer.i have gone through the your project and > >> >>>> gone > >> >>>>>>>> through > >> >>>>>>>>>>>>> the bugs list.I like to work on the bug > >> >>>>>>>>>>>>> "cTAKE-189:GSoC:implement OCR/tika to standardize > text > >> >>>>>> inputs > >> >>>>>>>>>>>>> for cTAKES".can you allow me to > >> >>>>>>>>>> work > >> >>>>>>>>>>> on that? > >> > > >> > > > >
