Thanks Britt! I am downloading the source code now and I will install it soon. Right now, I have my mid semester exams for three days, I will come back in three days and start learning about what you have told me.
I am very familiar with Java. I know very little about UIMA. I know decision trees also very well. And I will learn about ctakes more soon. What all should I know about UIMA? On Sun, Mar 22, 2015 at 9:28 PM, britt fitch < [email protected]> wrote: > Sounds good. > > Starting with some references: > Docs: https://open.med.harvard.edu/wiki/display/SCRUBBER/3.X > Publication: http://www.biomedcentral.com/1472-6947/13/112/abstract > (check out the supplemental material as well for additional details on > running and improvements) > SVN (old, standalone, Scrubber v.3.x): > https://open.med.harvard.edu/wiki/display/SCRUBBER/Software > SVN (initial apache port to ctakes sandbox): > https://svn.apache.org/repos/asf/ctakes/sandbox/ctakes-scrubber-deid/ > > The project started off as a standalone process and became a UIMA pipeline > (outside of ctakes). > The plan had always been to port this to an optional ctakes module but we > never got that fully implemented. > > Some of the parts that need the most attention to get going: > > - working with the ctakes type system > - pulling out weka (ML lib) for an asf 2.0 friendly lib instead > - simpler process for building the models. > > > Regarding knowledge, its good to be familiar with java, UIMA, decision > trees, and ctakes. Likely in that order. > > While this is still in the sandbox and you are still getting familiar with > running it as a standalone app feel free to ping me and andy off-list if > thats more convenient. > Then we can definitely bring it back to the dev list while getting it > running in ctakes. > > Cheers, > > Britt > > Britt Fitch > Wired Informatics > 265 Franklin St Ste 1702 > Boston, MA 02110 > http://wiredinformatics.com > [email protected] > > On Mar 20, 2015, at 7:57 PM, andy mcmurry <[email protected]> wrote: > > Britt et al: here is a student named rohit interested in getting the > deidentification pipeline running again. Hoping there is still interest in > getting this going in ctakes for real. Comments? > ---------- Forwarded message ---------- > From: "Rohit Shinde" <[email protected]> > Date: Mar 20, 2015 5:02 AM > Subject: Re: Medical de-identification > To: "andy mcmurry" <[email protected]> > Cc: > > I would certainly be interested into "production grade code". The project > also sounds interesting. How do I start working on it? I know Java well. > What else would I need to know before starting on this project? > > On Fri, Mar 20, 2015 at 12:44 PM, andy mcmurry <[email protected]> > wrote: > > Yes, the project is in Java, the code was written for a research project > and never made into "production grade code". If you are interested, we > would like to turn the scrubber into a solid pipeline. Java programming > 100%, with Colt statistical library > On Mar 19, 2015 7:52 PM, "Rohit Shinde" <[email protected]> > wrote: > > Hi Andy, > > Could you please tell me more about that project? I would really like a > reply. > > Thank you, > Rohit Shinde > > On Wed, Mar 18, 2015 at 5:51 PM, Rohit Shinde < > [email protected]> wrote: > > Hi Andy, > > I am interested in medical de-identification. I would like to know what > this project consists of. Is it partially implemented, or does the > implementation need to start? > > What languages would I need to know? What theoretical background would I > need? Also, how complex would this task be? What parts of OpenNLP does this > project use? > > Thank you, > Rohit Shinde > > > > >
