Hi Sandeep,

I'll try and review this today.

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++






-----Original Message-----
From: sandeep rg <sandeep.f...@gmail.com>
Reply-To: "dev@ctakes.apache.org" <dev@ctakes.apache.org>
Date: Monday, July 22, 2013 7:04 AM
To: "dev@ctakes.apache.org" <dev@ctakes.apache.org>
Subject: Re: to involve in your development group

>sir,
> i have gone through some of the medical record such as bills,patient
>details etc. most of them are printed using dot matrix printer,which is
>very hard to extract such type text from scanned images.i have done
>testing
>with some professional software such as abbyy fine reader which also given
>a poor output.
>
>but sir i have the confidence to do it.but i need more knowledge about
>image processing capabilities.so can you suggest any one who is good in
>image processing programming in your team?
>
>
>On Thu, Jul 18, 2013 at 1:22 AM, sandeep rg <sandeep.f...@gmail.com>
>wrote:
>
>> i hava done sequence diagram and done some small changes,please go
>>through
>> it and tell me if any more thing is to be included
>>
>>
>> On Wed, Jul 17, 2013 at 9:37 PM, sandeep rg
>><sandeep.f...@gmail.com>wrote:
>>
>>> it just a skeleton of original proposal
>>>
>>>
>>> On Wed, Jul 17, 2013 at 9:31 PM, sandeep rg
>>><sandeep.f...@gmail.com>wrote:
>>>
>>>> the sample work is shared with you both.any more details to be
>>>>included
>>>> please tell me.
>>>> In which,GUI design,schedule and implementation flow chart design is
>>>>to
>>>> added which is under construction and will be uploaded within few
>>>>hours.
>>>>
>>>>
>>>> On Wed, Jul 17, 2013 at 7:56 PM, Chen, Pei <
>>>> pei.c...@childrens.harvard.edu> wrote:
>>>>
>>>>> pei.stat...@gmail.com
>>>>>
>>>>> > -----Original Message-----
>>>>> > From: Mattmann, Chris A (398J)
>>>>>[mailto:chris.a.mattm...@jpl.nasa.gov]
>>>>> > Sent: Wednesday, July 17, 2013 10:22 AM
>>>>> > To: dev@ctakes.apache.org
>>>>> > Subject: Re: to involve in your development group
>>>>> >
>>>>> > chris.mattm...@gmail.com
>>>>> >
>>>>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>> > ++++++++
>>>>> > Chris Mattmann, Ph.D.
>>>>> > Senior Computer Scientist
>>>>> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>>> > Office: 171-266B, Mailstop: 171-246
>>>>> > Email: chris.a.mattm...@nasa.gov
>>>>> > WWW:  http://sunset.usc.edu/~mattmann/
>>>>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>> > ++++++++
>>>>> > Adjunct Assistant Professor, Computer Science Department
>>>>>University of
>>>>> > Southern California, Los Angeles, CA 90089 USA
>>>>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>> > ++++++++
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> > -----Original Message-----
>>>>> > From: sandeep rg <sandeep.f...@gmail.com>
>>>>> > Reply-To: "dev@ctakes.apache.org" <dev@ctakes.apache.org>
>>>>> > Date: Wednesday, July 17, 2013 6:53 AM
>>>>> > To: "dev@ctakes.apache.org" <dev@ctakes.apache.org>
>>>>> > Subject: Re: to involve in your development group
>>>>> >
>>>>> > >can you provide your gmail id to share the proposal document with
>>>>> you?
>>>>> > >
>>>>> > >
>>>>> > >
>>>>> > >On Tue, Jul 16, 2013 at 11:33 PM, sandeep rg
>>>>><sandeep.f...@gmail.com
>>>>> >
>>>>> > >wrote:
>>>>> > >
>>>>> > >> sir,
>>>>> > >> i am providing proposal by two days.now i am mainly going
>>>>>through
>>>>> > >>ASF-ICFOSS gateway because if i gone through their way and my
>>>>> proposal
>>>>> > >>is  get selected,ICFOSS will provide some sort of support such as
>>>>> > >>certificates,small financial support etc. to us.
>>>>> > >>
>>>>> > >>
>>>>> > >> but,main thing is i like programming,i like to explore through
>>>>>the
>>>>> > >> new technologies in coding and like to interact with the
>>>>>coding.so
>>>>> if
>>>>> > >> my proposal is got rejected,then also i like to work in your
>>>>> project
>>>>> > >> as a volunteer if you allow me..
>>>>> > >>
>>>>> > >> now i am preparing a proposal,within 2 days i will submit
>>>>> > >> it..Mattmann chris helped me to know more about the format of
>>>>> > proposal.
>>>>> > >>
>>>>> > >>
>>>>> > >> On Tue, Jul 16, 2013 at 8:12 PM, Chen, Pei
>>>>> > >><pei.c...@childrens.harvard.edu
>>>>> > >> > wrote:
>>>>> > >>
>>>>> > >>> Chris/Sandeep,
>>>>> > >>> According to ASF-ICFOSS, I believe the deadline for submitting
>>>>> > >>>proposals  is this coming Friday (July 19).
>>>>> > >>> After which point, mentors will have 2 weeks to review and
>>>>> > >>>score/accept.
>>>>> > >>> Just curious, are we planning to follow the same process here?
>>>>> Or
>>>>> > >>>since  it's all volunteer work, technically- sandeep and still
>>>>> > >>>contribute code to  the community and participate in the dev
>>>>>group
>>>>> > >>>here.
>>>>> > >>>
>>>>> > >>> Looking forward to it.
>>>>> > >>> --Pei
>>>>> > >>>
>>>>> > >>>
>>>>> > >>> > -----Original Message-----
>>>>> > >>> > From: sandeep rg [mailto:sandeep.f...@gmail.com]
>>>>> > >>> > Sent: Monday, July 15, 2013 1:05 PM
>>>>> > >>> > To: dev@ctakes.apache.org
>>>>> > >>> > Subject: Re: to involve in your development group
>>>>> > >>> >
>>>>> > >>> > sir,
>>>>> > >>> > i gone through most of the ocr technologies and reached a
>>>>> > >>>conclusion.i
>>>>> > >>> > would like to use apache tika and java ocr for this pupose.
>>>>> > >>> >
>>>>> > >>> > Tessearact is a ocr tool,it can be used for extracting from
>>>>> > >>> > multiple languages.it is implemented in vc++.so it can
>>>>>acceded
>>>>> > >>> > using java
>>>>> > >>>native
>>>>> > >>> > function.they provided another  tool tess4j but review says
>>>>>that
>>>>> > >>> > it
>>>>> > >>>has
>>>>> > >>> > many bugs.
>>>>> > >>> >
>>>>> > >>> > Apache tika developed in java language.it can be used to
>>>>> extract
>>>>> > >>> > text
>>>>> > >>> data
>>>>> > >>> > from .xls,word,txt,pdf and other many formats.it is easy for
>>>>> > >>> implementing
>>>>> > >>> > in project also.i have just gone through its implementation
>>>>>way.
>>>>> > >>> >
>>>>> > >>> > then about javaocr,its good for extrating text from a jpeg or
>>>>> > >>> > scanned images.we can train it with various fonts.more we
>>>>>train
>>>>> > >>> > more will be
>>>>> > >>>its
>>>>> > >>> > accuracy but its speed will get decreased.i didn't find any
>>>>> > >>>particular
>>>>> > >>> > documentation for that.
>>>>> > >>> >
>>>>> > >>> >
>>>>> > >>> >
>>>>> > >>> > On Sun, Jul 14, 2013 at 9:18 PM, sandeep rg
>>>>> > >>> > <sandeep.f...@gmail.com>
>>>>> > >>> > wrote:
>>>>> > >>> >
>>>>> > >>> > > thanks a lot for both of your support.I will do my best to
>>>>> find
>>>>> > >>> solution
>>>>> > >>> > > for jira problem.i will share the proposal with both of
>>>>>you..
>>>>> > >>> > >
>>>>> > >>> > >
>>>>> > >>> > >
>>>>> > >>> > > On Sun, Jul 14, 2013 at 1:46 AM, Chen, Pei
>>>>> > >>> > <pei.c...@childrens.harvard.edu
>>>>> > >>> > > > wrote:
>>>>> > >>> > >
>>>>> > >>> > >> Sandeep,
>>>>> > >>> > >> Its great to have Chris on board as well- he was one of
>>>>>the
>>>>> > >>> coordinators
>>>>> > >>> > >> of GSoC.
>>>>> > >>> > >> Looking forward to it.
>>>>> > >>> > >>
>>>>> > >>> > >> Sent from my iPhone
>>>>> > >>> > >>
>>>>> > >>> > >> On Jul 13, 2013, at 12:24 PM, "Mattmann, Chris A (398J)" <
>>>>> > >>> > >> chris.a.mattm...@jpl.nasa.gov> wrote:
>>>>> > >>> > >>
>>>>> > >>> > >> > Hi Sandeep,
>>>>> > >>> > >> >
>>>>> > >>> > >> > That is great news, and good job. OK, for some ideas
>>>>>about
>>>>> > >>> developing
>>>>> > >>> > >> > your proposal, you may want to simply start with a
>>>>>Google
>>>>> > >>> > >> > Docs,
>>>>> > >>>and
>>>>> > >>> > then
>>>>> > >>> > >> > share it with Pei. I'd be happy to help co-mentor if Pei
>>>>> and
>>>>> > >>> > >> > you
>>>>> > >>> think
>>>>> > >>> > >> > it's useful too.
>>>>> > >>> > >> >
>>>>> > >>> > >> > Your proposal should likely cover:
>>>>> > >>> > >> >
>>>>> > >>> > >> > 1. Background - what's the state of CTAKES-189 and
>>>>>what's
>>>>> it
>>>>> > >>> trying to
>>>>> > >>> > >> > accomplish
>>>>> > >>> > >> >  (include some figures, etc. along with your text)
>>>>> > >>> > >> >
>>>>> > >>> > >> > 2. Approach - what are you going to do to solve
>>>>>CTAKES-189.
>>>>> > >>> > >> > Be
>>>>> > >>> specific,
>>>>> > >>> > >> > and
>>>>> > >>> > >> >  try to break it down into smaller, easily reversible
>>>>>steps
>>>>> > >>> > >> >
>>>>> > >>> > >> > 3. Schedule - how long and what is the schedule for
>>>>> achieving
>>>>> > >>>this?
>>>>> > >>> > >> >
>>>>> > >>> > >> > 4. Risks/etc. - any known risks like are you taking a
>>>>> > >>> > >> > vacation
>>>>> > >>> anytime
>>>>> > >>> > >> > soon :)
>>>>> > >>> > >> >  or are there other time constraints?
>>>>> > >>> > >> >
>>>>> > >>> > >> > 5. References, etc.
>>>>> > >>> > >> >
>>>>> > >>> > >> > HTH and I'd be happy if you want to share the GDocs
>>>>>with me
>>>>> > >>> > >> > as
>>>>> > >>>you
>>>>> > >>> > >> develop
>>>>> > >>> > >> > it.
>>>>> > >>> > >> >
>>>>> > >>> > >> > Cheers!
>>>>> > >>> > >> >
>>>>> > >>> > >> > Chris
>>>>> > >>> > >> >
>>>>> > >>> > >> >
>>>>> > >>> >
>>>>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>> > >>> > ++++++++
>>>>> > >>> > >> > Chris Mattmann, Ph.D.
>>>>> > >>> > >> > Senior Computer Scientist
>>>>> > >>> > >> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>>> > >>> > >> > Office: 171-266B, Mailstop: 171-246
>>>>> > >>> > >> > Email: chris.a.mattm...@nasa.gov
>>>>> > >>> > >> > WWW:  http://sunset.usc.edu/~mattmann/
>>>>> > >>> > >> >
>>>>> > >>> >
>>>>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>> > >>> > ++++++++
>>>>> > >>> > >> > Adjunct Assistant Professor, Computer Science Department
>>>>> > >>> > >> > University of Southern California, Los Angeles, CA 90089
>>>>> USA
>>>>> > >>> > >> >
>>>>> > >>> >
>>>>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>> > >>> > ++++++++
>>>>> > >>> > >> >
>>>>> > >>> > >> >
>>>>> > >>> > >> >
>>>>> > >>> > >> >
>>>>> > >>> > >> >
>>>>> > >>> > >> >
>>>>> > >>> > >> > -----Original Message-----
>>>>> > >>> > >> > From: sandeep rg <sandeep.f...@gmail.com>
>>>>> > >>> > >> > Reply-To: "dev@ctakes.apache.org"
>>>>><dev@ctakes.apache.org>
>>>>> > >>> > >> > Date: Saturday, July 13, 2013 8:57 AM
>>>>> > >>> > >> > To: "dev@ctakes.apache.org" <dev@ctakes.apache.org>
>>>>> > >>> > >> > Subject: Re: to involve in your development group
>>>>> > >>> > >> >
>>>>> > >>> > >> >> i have also gone through the technologies available for
>>>>> > >>> development
>>>>> > >>> > of
>>>>> > >>> > >> >> ocr,from that i think apache tika and tessearact is
>>>>>best
>>>>> for
>>>>> > >>> resolving
>>>>> > >>> > >> the
>>>>> > >>> > >> >> problem.
>>>>> > >>> > >> >>
>>>>> > >>> > >> >>
>>>>> > >>> > >> >> On Sat, Jul 13, 2013 at 9:02 PM, sandeep rg
>>>>> > >>> > <sandeep.f...@gmail.com>
>>>>> > >>> > >> >> wrote:
>>>>> > >>> > >> >>
>>>>> > >>> > >> >>> hi Mattamann Chris,
>>>>> > >>> > >> >>> i has participated in the event coordinated by luciano
>>>>> > >>> > >> >>> resende
>>>>> > >>> > >> >>>
>>>>> > >>> > >> >>> http://community.apache.org/mentoringprogramme-icfoss-
>>>>> > >>> > pilot.html
>>>>> > >>> > >> >>>
>>>>> > >>> > >> >>> and from that i learned about open source and like to
>>>>> work
>>>>> > >>> > >> >>> on
>>>>> > >>> your
>>>>> > >>> > >> >>> project
>>>>> > >>> > >> >>> ctakes.i would like to fix the jira
>>>>> > >>> > >> >>>
>>>>> > >>> > >> >>> https://issues.apache.org/jira/browse/CTAKES-189
>>>>> > >>> > >> >>>
>>>>> > >>> > >> >>> chen pei accepted my requested to be my mentor.now i
>>>>>want
>>>>> > >>> > >> >>> to
>>>>> > >>>give
>>>>> > >>> > a
>>>>> > >>> > >> >>> proposal to apache about the project i am going to
>>>>>work
>>>>> > >>> > >> >>> on.can
>>>>> > >>> you
>>>>> > >>> > >> help
>>>>> > >>> > >> >>> me
>>>>> > >>> > >> >>> to prepare a proposal to be submitted before 18 th of
>>>>> this
>>>>> > >>>july.
>>>>> > >>> > >> >>>
>>>>> > >>> > >> >>>
>>>>> > >>> > >> >>>
>>>>> > >>> > >> >>>
>>>>> > >>> > >> >>>
>>>>> > >>> > >> >>>
>>>>> > >>> > >> >>> On Sat, Jul 13, 2013 at 2:26 AM, Mattmann, Chris A
>>>>> (398J) <
>>>>> > >>> > >> >>> chris.a.mattm...@jpl.nasa.gov> wrote:
>>>>> > >>> > >> >>>
>>>>> > >>> > >> >>>> Hi Sandeep,
>>>>> > >>> > >> >>>>
>>>>> > >>> > >> >>>> I think the best thing to do is:
>>>>> > >>> > >> >>>>
>>>>> > >>> > >> >>>> 1. Develop a JIRA issue here:
>>>>> > >>> > >> >>>> https://issues.apache.org/jira/browse/CTAKES
>>>>> > >>> > >> >>>> 1a. you can register for a new account on JIRA 2.
>>>>>Once
>>>>> > >>> > >> >>>> your JIRA issue is created, feel free to start a
>>>>> > >>> [DISCUSS]
>>>>> > >>> > >> >>>> thread
>>>>> > >>> > >> >>>> (e.g., with subject [DISCUSS] "some topic" where
>>>>>"some
>>>>> > >>>topic" is
>>>>> > >>> > >> >>>> perhaps
>>>>> > >>> > >> >>>> the main idea you have) on dev@ctakes.apache.org,
>>>>> > >>> > >> >>>> referencing
>>>>> > >>> > your
>>>>> > >>> > >> >>>> issue
>>>>> > >>> > >> >>>> and
>>>>> > >>> > >> >>>> asking for feedback
>>>>> > >>> > >> >>>> 3. Work with the Apache cTAKES PMC and committers to
>>>>>get
>>>>> > >>> > >> >>>> your
>>>>> > >>> > patches
>>>>> > >>> > >> >>>> and
>>>>> > >>> > >> >>>> other items attached to your issue from #1 committed
>>>>> into
>>>>> > >>> > >> >>>> the
>>>>> > >>> > sources
>>>>> > >>> > >> >>>>
>>>>> > >>> > >> >>>> Ideally if 1-3 happen and it's a good interaction,
>>>>> Apache
>>>>> > >>> > >> >>>> is
>>>>> > >>> built on
>>>>> > >>> > >> >>>> meritocracy and you could possibly earn the merit to
>>>>> > >>> > >> >>>> become a
>>>>> > >>> PMC
>>>>> > >>> > >> >>>> member
>>>>> > >>> > >> >>>> or committer on the project.
>>>>> > >>> > >> >>>>
>>>>> > >>> > >> >>>> Cheers,
>>>>> > >>> > >> >>>> Chris
>>>>> > >>> > >> >>>>
>>>>> > >>> > >> >>>>
>>>>> > >>> >
>>>>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>> > >>> > ++++++++
>>>>> > >>> > >> >>>> Chris Mattmann, Ph.D.
>>>>> > >>> > >> >>>> Senior Computer Scientist
>>>>> > >>> > >> >>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>>> > >>> > >> >>>> Office: 171-266B, Mailstop: 171-246
>>>>> > >>> > >> >>>> Email: chris.a.mattm...@nasa.gov
>>>>> > >>> > >> >>>> WWW:  http://sunset.usc.edu/~mattmann/
>>>>> > >>> > >> >>>>
>>>>> > >>> >
>>>>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>> > >>> > ++++++++
>>>>> > >>> > >> >>>> Adjunct Assistant Professor, Computer Science
>>>>>Department
>>>>> > >>> > >> >>>> University of Southern California, Los Angeles, CA
>>>>>90089
>>>>> > >>> > >> >>>> USA
>>>>> > >>> > >> >>>>
>>>>> > >>> >
>>>>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>> > >>> > ++++++++
>>>>> > >>> > >> >>>>
>>>>> > >>> > >> >>>>
>>>>> > >>> > >> >>>>
>>>>> > >>> > >> >>>>
>>>>> > >>> > >> >>>>
>>>>> > >>> > >> >>>>
>>>>> > >>> > >> >>>> -----Original Message-----
>>>>> > >>> > >> >>>> From: sandeep rg <sandeep.f...@gmail.com>
>>>>> > >>> > >> >>>> Reply-To: "dev@ctakes.apache.org"
>>>>> > <dev@ctakes.apache.org>
>>>>> > >>> > >> >>>> Date: Thursday, July 11, 2013 11:30 AM
>>>>> > >>> > >> >>>> To: "dev@ctakes.apache.org" <dev@ctakes.apache.org>
>>>>> > >>> > >> >>>> Subject: Re: to involve in your development group
>>>>> > >>> > >> >>>>
>>>>> > >>> > >> >>>>> can you provide what all details i should include
>>>>>in a
>>>>> > >>> > >> >>>> proposal?whether i
>>>>> > >>> > >> >>>>> wanted to include all implemetation(technical)
>>>>>details
>>>>> in
>>>>> > >>>the
>>>>> > >>> > >> >>>> proposal?
>>>>> > >>> > >> >>>>>
>>>>> > >>> > >> >>>>>
>>>>> > >>> > >> >>>>> On Thu, Jul 11, 2013 at 9:45 PM, Mattmann, Chris A
>>>>> (398J)
>>>>> > >>> > >> >>>>> < chris.a.mattm...@jpl.nasa.gov> wrote:
>>>>> > >>> > >> >>>>>
>>>>> > >>> > >> >>>>>> Dear Sandeep,
>>>>> > >>> > >> >>>>>>
>>>>> > >>> > >> >>>>>> Thanks for your interest in cTAKES. We would
>>>>>welcome
>>>>> > >>> > >> >>>>>> your
>>>>> > >>> > >> >>>> contribution
>>>>> > >>> > >> >>>>>> and are happy to have your interest in the project.
>>>>> > >>> > >> >>>>>>
>>>>> > >>> > >> >>>>>> Cheers,
>>>>> > >>> > >> >>>>>> Chris
>>>>> > >>> > >> >>>>>>
>>>>> > >>> > >> >>>>>>
>>>>> > >>> >
>>>>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>> > >>> > ++++++++
>>>>> > >>> > >> >>>>>> Chris Mattmann, Ph.D.
>>>>> > >>> > >> >>>>>> Senior Computer Scientist NASA Jet Propulsion
>>>>> Laboratory
>>>>> > >>> > >> >>>>>> Pasadena, CA 91109 USA
>>>>> > >>> > >> >>>>>> Office: 171-266B, Mailstop: 171-246
>>>>> > >>> > >> >>>>>> Email: chris.a.mattm...@nasa.gov
>>>>> > >>> > >> >>>>>> WWW:  http://sunset.usc.edu/~mattmann/
>>>>> > >>> > >> >>>>>>
>>>>> > >>> >
>>>>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>> > >>> > ++++++++
>>>>> > >>> > >> >>>>>> Adjunct Assistant Professor, Computer Science
>>>>> > Department
>>>>> > >>> > >> >>>>>> University of Southern California, Los Angeles, CA
>>>>> 90089
>>>>> > >>>USA
>>>>> > >>> > >> >>>>>>
>>>>> > >>> >
>>>>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>> > >>> > ++++++++
>>>>> > >>> > >> >>>>>>
>>>>> > >>> > >> >>>>>>
>>>>> > >>> > >> >>>>>>
>>>>> > >>> > >> >>>>>>
>>>>> > >>> > >> >>>>>>
>>>>> > >>> > >> >>>>>>
>>>>> > >>> > >> >>>>>> -----Original Message-----
>>>>> > >>> > >> >>>>>> From: sandeep rg <sandeep.f...@gmail.com>
>>>>> > >>> > >> >>>>>> Reply-To: "dev@ctakes.apache.org"
>>>>> > >>> > >> >>>>>> <dev@ctakes.apache.org>
>>>>> > >>> > >> >>>>>> Date: Wednesday, July 10, 2013 11:01 AM
>>>>> > >>> > >> >>>>>> To: "dev@ctakes.apache.org" <dev@ctakes.apache.org>
>>>>> > >>> > >> >>>>>> Subject: Re: to involve in your development group
>>>>> > >>> > >> >>>>>>
>>>>> > >>> > >> >>>>>>> sir,
>>>>> > >>> > >> >>>>>>>
>>>>> > >>> > >> >>>>>>> My name is sandeep rg.i am a btech graduate in
>>>>> computer
>>>>> > >>> > >> science.now
>>>>> > >>> > >> >>>>>> doing
>>>>> > >>> > >> >>>>>>> an internship in a company in java language.
>>>>> > >>> > >> >>>>>>>
>>>>> > >>> > >> >>>>>>> then  i had installed all things succesfully,now
>>>>> > >>>downloading
>>>>> > >>> the
>>>>> > >>> > >> >>>>>>> resource.ittake too much time.
>>>>> > >>> > >> >>>>>>>
>>>>> > >>> > >> >>>>>>> i have gone through the suggested ocr
>>>>>technologies.
>>>>> > >>> > >> >>>>>>> Javaocr has some good user review.
>>>>> > >>> > >> >>>>>>> Apache tika has a capability to process different
>>>>> types
>>>>> > >>> > >> >>>>>>> of
>>>>> > >>> format.
>>>>> > >>> > >> >>>>>>> More than that there is tesserract which are also
>>>>> used
>>>>> > >>> > >> >>>>>>> for
>>>>> > >>> ocr
>>>>> > >>> > >> >>>> purpose.
>>>>> > >>> > >> >>>>>>> then apache pdfbox is also used for text
>>>>>extratcion
>>>>> but
>>>>> > >>>only
>>>>> > >>> for
>>>>> > >>> > >> >>>> pdf
>>>>> > >>> > >> >>>>>>> files.
>>>>> > >>> > >> >>>>>>> now i am going through every thing to find out
>>>>>best
>>>>> > >>> technology
>>>>> > >>> > >> from
>>>>> > >>> > >> >>>>>> this.
>>>>> > >>> > >> >>>>>>>
>>>>> > >>> > >> >>>>>>>
>>>>> > >>> > >> >>>>>>> On Wed, Jul 10, 2013 at 12:52 AM, Chen, Pei
>>>>> > >>> > >> >>>>>>> <pei.c...@childrens.harvard.edu>wrote:
>>>>> > >>> > >> >>>>>>>
>>>>> > >>> > >> >>>>>>>> Hi Sandeep,
>>>>> > >>> > >> >>>>>>>> I am delighted to work with you on this project.
>>>>> > >>> > >> >>>>>>>>
>>>>> > >>> > >> >>>>>>>> I was not sure if I understood you correctly- did
>>>>> you
>>>>> > >>>mean
>>>>> > >>> to
>>>>> > >>> > say
>>>>> > >>> > >> >>>>>> that
>>>>> > >>> > >> >>>>>>>> you
>>>>> > >>> > >> >>>>>>>> have already tried using cTAKES and it's
>>>>>components?
>>>>> > >>> > >> >>>>>>>> If not, you can do an svn checkout of the code
>>>>>and
>>>>> try
>>>>> > >>> running
>>>>> > >>> > >> >>>> the
>>>>> > >>> > >> >>>>>>>> debugger gui from the command line (or
>>>>>eclipseide)
>>>>> > >>> > >> >>>>>>>> that
>>>>> > >>>will
>>>>> > >>> > >> >>>> allow
>>>>> > >>> > >> >>>>>> you
>>>>> > >>> > >> >>>>>>>> to
>>>>> > >>> > >> >>>>>>>> type in plain text and get back the different
>>>>> > >>> > >> >>>>>>>> structured
>>>>> > >>> content
>>>>> > >>> > >> >>>>>> (types)
>>>>> > >>> > >> >>>>>>>> that cTAKES produces:
>>>>> > >>> > >> >>>>>>>> MAVEN_OPTS="-Xmx2g -Xms1g"
>>>>> > >>> > >> >>>>>>>> mvn -PrunCVD compile
>>>>> > >>> > >> >>>>>>>> From the guide:
>>>>> > >>> > >> >>>>
>>>>> > >>> > >> >>>>
>>>>> > >>> > >>
>>>>> > >>> >
>>>>> > >>>
>>>>> https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.0+Devel
>>>>> > op
>>>>> > >>>e
>>>>> > >>> > r+
>>>>> > >>> > >> >>>> I
>>>>> > >>> > >> >>>>>>>> nstall+Guide
>>>>> > >>> > >> >>>>>>>>
>>>>> > >>> > >> >>>>>>>> A bit of background:
>>>>> > >>> > >> >>>>>>>> Apache cTAKES uses SVN for version on control:
>>>>> > >>> > >> >>>>>>>> https://svn.apache.org/repos/asf/ctakes/trunk/
>>>>> > >>> > >> >>>>>>>> Jira for issues tracking:
>>>>> > >>> > >> >>>>>>>> https://issues.apache.org/jira/browse/ctakes
>>>>> > >>> > >> >>>>>>>> Maven for building and dependency management.
>>>>> > >>> > >> >>>>>>>> A lot of the developers use Eclipse IDE for their
>>>>> > >>> development.
>>>>> > >>> > >> >>>>>>>> More info on ctakes.apache.org
>>>>> > >>> > >> >>>>>>>>
>>>>> > >>> > >> >>>>>>>> cTAKES is built on top of the Apache UIMA
>>>>>Framework.
>>>>> > >>> > >> >>>> Essentially,
>>>>> > >>> > >> >>>>>>>> cTAKES
>>>>> > >>> > >> >>>>>>>> is a collection of Annotators (Java Classes) and
>>>>> wired
>>>>> > >>> together
>>>>> > >>> > >> >>>> to
>>>>> > >>> > >> >>>>>> into
>>>>> > >>> > >> >>>>>>>> a
>>>>> > >>> > >> >>>>>>>> pipeline.
>>>>> > >>> > >> >>>>>>>> It's goal in a nutshell is to turn unstructured
>>>>> plain
>>>>> > >>>text
>>>>> > >>> into
>>>>> > >>> > >> >>>>>>>> structured/normalized form and specially trained
>>>>>for
>>>>> > >>>medical
>>>>> > >>> > >> >>>> notes.
>>>>> > >>> > >> >>>>>>>> Right now- the input cTAKES expects would be in
>>>>> plain
>>>>> > >>>text
>>>>> > >>> > form
>>>>> > >>> > >> >>>> and
>>>>> > >>> > >> >>>>>>>> cTAKES
>>>>> > >>> > >> >>>>>>>> does not have an OCR component.
>>>>> > >>> > >> >>>>>>>> cTAKE-189:GSoC:implement OCR/tika to standardize
>>>>> text
>>>>> > >>> > inputs was
>>>>> > >>> > >> >>>> an
>>>>> > >>> > >> >>>>>> idea
>>>>> > >>> > >> >>>>>>>> to allow cTAKES to take in any type of input
>>>>>(PDF,
>>>>> > >>>Images,
>>>>> > >>> > Word,
>>>>> > >>> > >> >>>> XLS,
>>>>> > >>> > >> >>>>>>>> etc.)
>>>>> > >>> > >> >>>>>>>> and pass the text for cTAKES processing.
>>>>> > >>> > >> >>>>>>>> [I was originally thinking this could be done in
>>>>> some
>>>>> > >>>kind
>>>>> > >>> of
>>>>> > >>> > >> >>>>>>>> preprocessing, or an optional Annotator that
>>>>>could
>>>>> be
>>>>> > >>>added
>>>>> > >>> in
>>>>> > >>> > >> >>>> the
>>>>> > >>> > >> >>>>>>>> beginning of a pipeline].  There may be some
>>>>> existing
>>>>> > >>>work
>>>>> > >>> > that
>>>>> > >>> > >> >>>>>> could be
>>>>> > >>> > >> >>>>>>>> potentially reused: Apache Tika (
>>>>> > >>> > >> >>>>>>>> https://issues.apache.org/jira/browse/TIKA-93 )
>>>>>as
>>>>> > >>> > >> >>>>>>>> well
>>>>> > >>>as
>>>>> > >>> > some
>>>>> > >>> > >> >>>> open
>>>>> > >>> > >> >>>>>>>> source OCR toolkits (JavaOCR).
>>>>> > >>> > >> >>>>>>>>
>>>>> > >>> > >> >>>>>>>> About Me:
>>>>> > >>> > >> >>>>
>>>>> > >>> > >> >>>>
>>>>> > >>> > >>
>>>>> > >>> >
>>>>> > >>>
>>>>> > >>>
>>>>> http://childrenshospital.org/cfapps/research/data_admin/Site3240/main
>>>>> > >>>pag
>>>>> > >>> > >> >>>> e
>>>>> > >>> > >> >>>>>>>> S3240P8.html
>>>>> > >>> > >> >>>>>>>> http://www.linkedin.com/in/peistation
>>>>> > >>> > >> >>>>>>>> http://people.apache.org/committer-
>>>>> > index.html#chenpei
>>>>> > >>> > >> >>>>>>>>
>>>>> > >>> > >> >>>>>>>>> -----Original Message-----
>>>>> > >>> > >> >>>>>>>>> From: sandeep rg [mailto:sandeep.f...@gmail.com]
>>>>> > >>> > >> >>>>>>>>> Sent: Tuesday, July 09, 2013 1:19 PM
>>>>> > >>> > >> >>>>>>>>> To: dev@ctakes.apache.org
>>>>> > >>> > >> >>>>>>>>> Subject: Re: to involve in your development
>>>>>group
>>>>> > >>> > >> >>>>>>>>>
>>>>> > >>> > >> >>>>>>>>> Thanks a lot for giving me support.i like to
>>>>>work
>>>>> > >>> > >> >>>>>>>>> with
>>>>> > >>>you.
>>>>> > >>> > >> >>>>>>>>>
>>>>> > >>> > >> >>>>>>>>> I have gone through the objectives of the
>>>>> > >>> > >> >>>>>>>>> software,used
>>>>> > >>>the
>>>>> > >>> > >> >>>>>> software
>>>>> > >>> > >> >>>>>>>> and
>>>>> > >>> > >> >>>>>>>>> gone through various components of the
>>>>>project.can
>>>>> > >>> > >> >>>>>>>>> you
>>>>> > >>> > provide
>>>>> > >>> > >> >>>> me
>>>>> > >>> > >> >>>>>>>> starting
>>>>> > >>> > >> >>>>>>>>> point from where i should start to know more
>>>>>about
>>>>> > >>> > >> >>>>>>>>> the
>>>>> > >>> > coding
>>>>> > >>> > >> >>>> part
>>>>> > >>> > >> >>>>>> of
>>>>> > >>> > >> >>>>>>>> the
>>>>> > >>> > >> >>>>>>>>> project.
>>>>> > >>> > >> >>>>>>>>>
>>>>> > >>> > >> >>>>>>>>> can you tell me more about the project and about
>>>>> you
>>>>> > >>>also?
>>>>> > >>> > >> >>>>>>>>>
>>>>> > >>> > >> >>>>>>>>>
>>>>> > >>> > >> >>>>>>>>> On Tue, Jul 9, 2013 at 1:14 AM, Chen, Pei
>>>>> > >>> > >> >>>>>>>>> <pei.c...@childrens.harvard.edu>wrote:
>>>>> > >>> > >> >>>>>>>>>
>>>>> > >>> > >> >>>>>>>>>> Hi Sandeep,
>>>>> > >>> > >> >>>>>>>>>> Thank you for the interest.  I just had a quick
>>>>> look
>>>>> > >>> > >> >>>>>>>>>> at
>>>>> > >>> the
>>>>> > >>> > >> >>>>>> ICFOSS
>>>>> > >>> > >> >>>>>>>>>> pilot mentoring program and will be happy to
>>>>>serve
>>>>> > >>> > >> >>>>>>>>>> as a
>>>>> > >>> > >> >>>> mentor
>>>>> > >>> > >> >>>>>> for
>>>>> > >>> > >> >>>>>>>>>> your project
>>>>> > >>> > >> >>>>>>>>>> proposal(s) if you are interested.
>>>>> > >>> > >> >>>>>>>>>>
>>>>> > >>> > >> >>>>>>>>>> --Pei
>>>>> > >>> > >> >>>>>>>>>>
>>>>> > >>> > >> >>>>>>>>>>> -----Original Message-----
>>>>> > >>> > >> >>>>>>>>>>> From: sandeep rg
>>>>>[mailto:sandeep.f...@gmail.com]
>>>>> > >>> > >> >>>>>>>>>>> Sent: Monday, July 08, 2013 2:24 PM
>>>>> > >>> > >> >>>>>>>>>>> To: dev@ctakes.apache.org
>>>>> > >>> > >> >>>>>>>>>>> Subject: Re: to involve in your development
>>>>>group
>>>>> > >>> > >> >>>>>>>>>>>
>>>>> > >>> > >> >>>>>>>>>>> sir,
>>>>> > >>> > >> >>>>>>>>>>>
>>>>> > >>> > >> >>>>>>>>>>> details of the program Pilot mentoring
>>>>>programme
>>>>> > >>> > >> >>>>>>>>>>> with
>>>>> > >>> > india
>>>>> > >>> > >> >>>>>> ICFOSS
>>>>> > >>> > >> >>>>>>>>>>> is
>>>>> > >>> > >> >>>>>>>>>> given
>>>>> > >>> > >> >>>>>>>>>>> in the below web address
>>>>> > >>> > >> >>>>>> http://community.apache.org/mentoringprogramme-
>>>>> > icfoss-
>>>>> > >>> > pilot.html
>>>>> > >>> > >> >>>>>>>>>>>
>>>>> > >>> > >> >>>>>>>>>>>
>>>>> > >>> > >> >>>>>>>>>>> I am new to this community so i need a mentor
>>>>>for
>>>>> > >>> > >> >>>>>>>>>>> the
>>>>> > >>> > >> >>>>>> project.It
>>>>> > >>> > >> >>>>>>>>>>> will be
>>>>> > >>> > >> >>>>>>>>>> more
>>>>> > >>> > >> >>>>>>>>>>> helpful for me..
>>>>> > >>> > >> >>>>>>>>>>>
>>>>> > >>> > >> >>>>>>>>>>>
>>>>> > >>> > >> >>>>>>>>>>> On Mon, Jul 8, 2013 at 7:22 PM, Chen, Pei
>>>>> > >>> > >> >>>>>>>>>>> <pei.c...@childrens.harvard.edu>wrote:
>>>>> > >>> > >> >>>>>>>>>>>
>>>>> > >>> > >> >>>>>>>>>>>> Hi Sandeep,
>>>>> > >>> > >> >>>>>>>>>>>> Welcome!  I am not familiar with the details
>>>>>of
>>>>> > >>> > >> >>>>>> icfoss-apache,
>>>>> > >>> > >> >>>>>>>> but
>>>>> > >>> > >> >>>>>>>>>>>> please- you are more than welcome to work on
>>>>>the
>>>>> > >>> > >> >>>>>>>>>>>> code
>>>>> > >>> > and
>>>>> > >>> > >> >>>>>>>>>>>> contributions will be greatly appreciated!
>>>>> > >>> > >> >>>>>>>>>>>> There may be a learning curve, but feel free
>>>>>let
>>>>> > >>> > >> >>>>>>>>>>>> us
>>>>> > >>>know
>>>>> > >>> > >> >>>> if
>>>>> > >>> > >> >>>>>> you
>>>>> > >>> > >> >>>>>>>>>>>> have any questions/issues.
>>>>> > >>> > >> >>>>>>>>>>>> Thanks,
>>>>> > >>> > >> >>>>>>>>>>>> Pei
>>>>> > >>> > >> >>>>>>>>>>>>
>>>>> > >>> > >> >>>>>>>>>>>>> -----Original Message-----
>>>>> > >>> > >> >>>>>>>>>>>>> From: sandeep rg
>>>>> > [mailto:sandeep.f...@gmail.com]
>>>>> > >>> > >> >>>>>>>>>>>>> Sent: Saturday, July 06, 2013 11:50 AM
>>>>> > >>> > >> >>>>>>>>>>>>> To: dev@ctakes.apache.org
>>>>> > >>> > >> >>>>>>>>>>>>> Subject: to involve in your development
>>>>>group
>>>>> > >>> > >> >>>>>>>>>>>>>
>>>>> > >>> > >> >>>>>>>>>>>>> my name is sandeep.i am btech graduate.i had
>>>>> > >>> > >> >>>> participated
>>>>> > >>> > >> >>>>>> in
>>>>> > >>> > >> >>>>>>>> a
>>>>> > >>> > >> >>>>>>>>>>>>> camp coordinated in kerala,India in
>>>>>association
>>>>> > >>> > >> >>>>>>>>>>>>> with icfoss-apache called as
>>>>> > >>> > >> >>>>>>>>>>>> youth
>>>>> > >>> > >> >>>>>>>>>>>>> mentoring programme coordinated by Luciano
>>>>> > resende.
>>>>> > >>> > >> >>>>>>>>>>>>>
>>>>> > >>> > >> >>>>>>>>>>>>>                                        i
>>>>>like
>>>>> the
>>>>> > >>> > >> >>>> project
>>>>> > >>> > >> >>>>>> and
>>>>> > >>> > >> >>>>>>>>>>>>> like to
>>>>> > >>> > >> >>>>>>>>>>>> involve in your project as a
>>>>> > >>> > >> >>>>>>>>>>>>> programmer.i have gone through the your
>>>>>project
>>>>> > >>> > >> >>>>>>>>>>>>> and
>>>>> > >>> > >> >>>> gone
>>>>> > >>> > >> >>>>>>>> through
>>>>> > >>> > >> >>>>>>>>>>>>> the bugs list.I like to work on the bug
>>>>> > >>> > >> >>>>>>>>>>>>> "cTAKE-189:GSoC:implement OCR/tika to
>>>>> > standardize
>>>>> > >>> > text
>>>>> > >>> > >> >>>>>> inputs
>>>>> > >>> > >> >>>>>>>>>>>>> for cTAKES".can you allow me to
>>>>> > >>> > >> >>>>>>>>>> work
>>>>> > >>> > >> >>>>>>>>>>> on that?
>>>>> > >>> > >> >
>>>>> > >>> > >>
>>>>> > >>> > >
>>>>> > >>> > >
>>>>> > >>>
>>>>> > >>
>>>>> > >>
>>>>>
>>>>>
>>>>
>>>
>>

Reply via email to