Sandeep, Its great to have Chris on board as well- he was one of the coordinators of GSoC. Looking forward to it.
Sent from my iPhone On Jul 13, 2013, at 12:24 PM, "Mattmann, Chris A (398J)" <[email protected]> wrote: > Hi Sandeep, > > That is great news, and good job. OK, for some ideas about developing > your proposal, you may want to simply start with a Google Docs, and then > share it with Pei. I'd be happy to help co-mentor if Pei and you think > it's useful too. > > Your proposal should likely cover: > > 1. Background - what's the state of CTAKES-189 and what's it trying to > accomplish > (include some figures, etc. along with your text) > > 2. Approach - what are you going to do to solve CTAKES-189. Be specific, > and > try to break it down into smaller, easily reversible steps > > 3. Schedule - how long and what is the schedule for achieving this? > > 4. Risks/etc. - any known risks like are you taking a vacation anytime > soon :) > or are there other time constraints? > > 5. References, etc. > > HTH and I'd be happy if you want to share the GDocs with me as you develop > it. > > Cheers! > > Chris > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Chris Mattmann, Ph.D. > Senior Computer Scientist > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > Office: 171-266B, Mailstop: 171-246 > Email: [email protected] > WWW: http://sunset.usc.edu/~mattmann/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Adjunct Assistant Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > > > -----Original Message----- > From: sandeep rg <[email protected]> > Reply-To: "[email protected]" <[email protected]> > Date: Saturday, July 13, 2013 8:57 AM > To: "[email protected]" <[email protected]> > Subject: Re: to involve in your development group > >> i have also gone through the technologies available for development of >> ocr,from that i think apache tika and tessearact is best for resolving the >> problem. >> >> >> On Sat, Jul 13, 2013 at 9:02 PM, sandeep rg <[email protected]> >> wrote: >> >>> hi Mattamann Chris, >>> i has participated in the event coordinated by luciano resende >>> >>> http://community.apache.org/mentoringprogramme-icfoss-pilot.html >>> >>> and from that i learned about open source and like to work on your >>> project >>> ctakes.i would like to fix the jira >>> >>> https://issues.apache.org/jira/browse/CTAKES-189 >>> >>> chen pei accepted my requested to be my mentor.now i want to give a >>> proposal to apache about the project i am going to work on.can you help >>> me >>> to prepare a proposal to be submitted before 18 th of this july. >>> >>> >>> >>> >>> >>> >>> On Sat, Jul 13, 2013 at 2:26 AM, Mattmann, Chris A (398J) < >>> [email protected]> wrote: >>> >>>> Hi Sandeep, >>>> >>>> I think the best thing to do is: >>>> >>>> 1. Develop a JIRA issue here: >>>> https://issues.apache.org/jira/browse/CTAKES >>>> 1a. you can register for a new account on JIRA >>>> 2. Once your JIRA issue is created, feel free to start a [DISCUSS] >>>> thread >>>> (e.g., with subject [DISCUSS] "some topic" where "some topic" is >>>> perhaps >>>> the main idea you have) on [email protected], referencing your >>>> issue >>>> and >>>> asking for feedback >>>> 3. Work with the Apache cTAKES PMC and committers to get your patches >>>> and >>>> other items attached to your issue from #1 committed into the sources >>>> >>>> Ideally if 1-3 happen and it's a good interaction, Apache is built on >>>> meritocracy and you could possibly earn the merit to become a PMC >>>> member >>>> or committer on the project. >>>> >>>> Cheers, >>>> Chris >>>> >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>> Chris Mattmann, Ph.D. >>>> Senior Computer Scientist >>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >>>> Office: 171-266B, Mailstop: 171-246 >>>> Email: [email protected] >>>> WWW: http://sunset.usc.edu/~mattmann/ >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>> Adjunct Assistant Professor, Computer Science Department >>>> University of Southern California, Los Angeles, CA 90089 USA >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>> >>>> >>>> >>>> >>>> >>>> >>>> -----Original Message----- >>>> From: sandeep rg <[email protected]> >>>> Reply-To: "[email protected]" <[email protected]> >>>> Date: Thursday, July 11, 2013 11:30 AM >>>> To: "[email protected]" <[email protected]> >>>> Subject: Re: to involve in your development group >>>> >>>>> can you provide what all details i should include in a >>>> proposal?whether i >>>>> wanted to include all implemetation(technical) details in the >>>> proposal? >>>>> >>>>> >>>>> On Thu, Jul 11, 2013 at 9:45 PM, Mattmann, Chris A (398J) < >>>>> [email protected]> wrote: >>>>> >>>>>> Dear Sandeep, >>>>>> >>>>>> Thanks for your interest in cTAKES. We would welcome your >>>> contribution >>>>>> and are happy to have your interest in the project. >>>>>> >>>>>> Cheers, >>>>>> Chris >>>>>> >>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>>> Chris Mattmann, Ph.D. >>>>>> Senior Computer Scientist >>>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >>>>>> Office: 171-266B, Mailstop: 171-246 >>>>>> Email: [email protected] >>>>>> WWW: http://sunset.usc.edu/~mattmann/ >>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>>> Adjunct Assistant Professor, Computer Science Department >>>>>> University of Southern California, Los Angeles, CA 90089 USA >>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -----Original Message----- >>>>>> From: sandeep rg <[email protected]> >>>>>> Reply-To: "[email protected]" <[email protected]> >>>>>> Date: Wednesday, July 10, 2013 11:01 AM >>>>>> To: "[email protected]" <[email protected]> >>>>>> Subject: Re: to involve in your development group >>>>>> >>>>>>> sir, >>>>>>> >>>>>>> My name is sandeep rg.i am a btech graduate in computer science.now >>>>>> doing >>>>>>> an internship in a company in java language. >>>>>>> >>>>>>> then i had installed all things succesfully,now downloading the >>>>>>> resource.ittake too much time. >>>>>>> >>>>>>> i have gone through the suggested ocr technologies. >>>>>>> Javaocr has some good user review. >>>>>>> Apache tika has a capability to process different types of format. >>>>>>> More than that there is tesserract which are also used for ocr >>>> purpose. >>>>>>> then apache pdfbox is also used for text extratcion but only for >>>> pdf >>>>>>> files. >>>>>>> now i am going through every thing to find out best technology from >>>>>> this. >>>>>>> >>>>>>> >>>>>>> On Wed, Jul 10, 2013 at 12:52 AM, Chen, Pei >>>>>>> <[email protected]>wrote: >>>>>>> >>>>>>>> Hi Sandeep, >>>>>>>> I am delighted to work with you on this project. >>>>>>>> >>>>>>>> I was not sure if I understood you correctly- did you mean to say >>>>>> that >>>>>>>> you >>>>>>>> have already tried using cTAKES and it's components? >>>>>>>> If not, you can do an svn checkout of the code and try running >>>> the >>>>>>>> debugger gui from the command line (or eclipseide) that will >>>> allow >>>>>> you >>>>>>>> to >>>>>>>> type in plain text and get back the different structured content >>>>>> (types) >>>>>>>> that cTAKES produces: >>>>>>>> MAVEN_OPTS="-Xmx2g -Xms1g" >>>>>>>> mvn -PrunCVD compile >>>>>>>> From the guide: >>>> >>>> https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.0+Developer+ >>>> I >>>>>>>> nstall+Guide >>>>>>>> >>>>>>>> A bit of background: >>>>>>>> Apache cTAKES uses SVN for version on control: >>>>>>>> https://svn.apache.org/repos/asf/ctakes/trunk/ >>>>>>>> Jira for issues tracking: >>>>>>>> https://issues.apache.org/jira/browse/ctakes >>>>>>>> Maven for building and dependency management. >>>>>>>> A lot of the developers use Eclipse IDE for their development. >>>>>>>> More info on ctakes.apache.org >>>>>>>> >>>>>>>> cTAKES is built on top of the Apache UIMA Framework. >>>> Essentially, >>>>>>>> cTAKES >>>>>>>> is a collection of Annotators (Java Classes) and wired together >>>> to >>>>>> into >>>>>>>> a >>>>>>>> pipeline. >>>>>>>> It's goal in a nutshell is to turn unstructured plain text into >>>>>>>> structured/normalized form and specially trained for medical >>>> notes. >>>>>>>> Right now- the input cTAKES expects would be in plain text form >>>> and >>>>>>>> cTAKES >>>>>>>> does not have an OCR component. >>>>>>>> cTAKE-189:GSoC:implement OCR/tika to standardize text inputs was >>>> an >>>>>> idea >>>>>>>> to allow cTAKES to take in any type of input (PDF, Images, Word, >>>> XLS, >>>>>>>> etc.) >>>>>>>> and pass the text for cTAKES processing. >>>>>>>> [I was originally thinking this could be done in some kind of >>>>>>>> preprocessing, or an optional Annotator that could be added in >>>> the >>>>>>>> beginning of a pipeline]. There may be some existing work that >>>>>> could be >>>>>>>> potentially reused: Apache Tika ( >>>>>>>> https://issues.apache.org/jira/browse/TIKA-93 ) as well as some >>>> open >>>>>>>> source OCR toolkits (JavaOCR). >>>>>>>> >>>>>>>> About Me: >>>> >>>> http://childrenshospital.org/cfapps/research/data_admin/Site3240/mainpag >>>> e >>>>>>>> S3240P8.html >>>>>>>> http://www.linkedin.com/in/peistation >>>>>>>> http://people.apache.org/committer-index.html#chenpei >>>>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: sandeep rg [mailto:[email protected]] >>>>>>>>> Sent: Tuesday, July 09, 2013 1:19 PM >>>>>>>>> To: [email protected] >>>>>>>>> Subject: Re: to involve in your development group >>>>>>>>> >>>>>>>>> Thanks a lot for giving me support.i like to work with you. >>>>>>>>> >>>>>>>>> I have gone through the objectives of the software,used the >>>>>> software >>>>>>>> and >>>>>>>>> gone through various components of the project.can you provide >>>> me >>>>>>>> starting >>>>>>>>> point from where i should start to know more about the coding >>>> part >>>>>> of >>>>>>>> the >>>>>>>>> project. >>>>>>>>> >>>>>>>>> can you tell me more about the project and about you also? >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, Jul 9, 2013 at 1:14 AM, Chen, Pei >>>>>>>>> <[email protected]>wrote: >>>>>>>>> >>>>>>>>>> Hi Sandeep, >>>>>>>>>> Thank you for the interest. I just had a quick look at the >>>>>> ICFOSS >>>>>>>>>> pilot mentoring program and will be happy to serve as a >>>> mentor >>>>>> for >>>>>>>>>> your project >>>>>>>>>> proposal(s) if you are interested. >>>>>>>>>> >>>>>>>>>> --Pei >>>>>>>>>> >>>>>>>>>>> -----Original Message----- >>>>>>>>>>> From: sandeep rg [mailto:[email protected]] >>>>>>>>>>> Sent: Monday, July 08, 2013 2:24 PM >>>>>>>>>>> To: [email protected] >>>>>>>>>>> Subject: Re: to involve in your development group >>>>>>>>>>> >>>>>>>>>>> sir, >>>>>>>>>>> >>>>>>>>>>> details of the program Pilot mentoring programme with india >>>>>> ICFOSS >>>>>>>>>>> is >>>>>>>>>> given >>>>>>>>>>> in the below web address >>>>>> http://community.apache.org/mentoringprogramme-icfoss-pilot.html >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I am new to this community so i need a mentor for the >>>>>> project.It >>>>>>>>>>> will be >>>>>>>>>> more >>>>>>>>>>> helpful for me.. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Mon, Jul 8, 2013 at 7:22 PM, Chen, Pei >>>>>>>>>>> <[email protected]>wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Sandeep, >>>>>>>>>>>> Welcome! I am not familiar with the details of >>>>>> icfoss-apache, >>>>>>>> but >>>>>>>>>>>> please- you are more than welcome to work on the code and >>>>>>>>>>>> contributions will be greatly appreciated! >>>>>>>>>>>> There may be a learning curve, but feel free let us know >>>> if >>>>>> you >>>>>>>>>>>> have any questions/issues. >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Pei >>>>>>>>>>>> >>>>>>>>>>>>> -----Original Message----- >>>>>>>>>>>>> From: sandeep rg [mailto:[email protected]] >>>>>>>>>>>>> Sent: Saturday, July 06, 2013 11:50 AM >>>>>>>>>>>>> To: [email protected] >>>>>>>>>>>>> Subject: to involve in your development group >>>>>>>>>>>>> >>>>>>>>>>>>> my name is sandeep.i am btech graduate.i had >>>> participated >>>>>> in >>>>>>>> a >>>>>>>>>>>>> camp coordinated in kerala,India in association with >>>>>>>>>>>>> icfoss-apache called as >>>>>>>>>>>> youth >>>>>>>>>>>>> mentoring programme coordinated by Luciano resende. >>>>>>>>>>>>> >>>>>>>>>>>>> i like the >>>> project >>>>>> and >>>>>>>>>>>>> like to >>>>>>>>>>>> involve in your project as a >>>>>>>>>>>>> programmer.i have gone through the your project and >>>> gone >>>>>>>> through >>>>>>>>>>>>> the bugs list.I like to work on the bug >>>>>>>>>>>>> "cTAKE-189:GSoC:implement OCR/tika to standardize text >>>>>> inputs >>>>>>>>>>>>> for cTAKES".can you allow me to >>>>>>>>>> work >>>>>>>>>>> on that? >
