Agreed - sounds very good!
--guergana

From: britt fitch [mailto:[email protected]]
Sent: Sunday, March 22, 2015 11:59 AM
To: [email protected]
Cc: Rohit Shinde
Subject: Re: Medical de-identification

Sounds good.

Starting with some references:
Docs: https://open.med.harvard.edu/wiki/display/SCRUBBER/3.X
Publication: 
http://www.biomedcentral.com/1472-6947/13/112/abstract<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.biomedcentral.com_1472-2D6947_13_112_abstract&d=BQMFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP&m=bElqtqimbSGc3Cbyxp1_oNRMbbZ9w7mScUEcNhes2TM&s=jgyc3gCZWJdxamCiymz-V4azLkLAs3bma06lfliuU34&e=>
  (check out the supplemental material as well for additional details on 
running and improvements)
SVN (old, standalone, Scrubber v.3.x): 
https://open.med.harvard.edu/wiki/display/SCRUBBER/Software
SVN (initial apache port to ctakes sandbox): 
https://svn.apache.org/repos/asf/ctakes/sandbox/ctakes-scrubber-deid/<https://urldefense.proofpoint.com/v2/url?u=https-3A__svn.apache.org_repos_asf_ctakes_sandbox_ctakes-2Dscrubber-2Ddeid_&d=BQMFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP&m=bElqtqimbSGc3Cbyxp1_oNRMbbZ9w7mScUEcNhes2TM&s=ciZ3kN8xH-3Rqmi1RE_YryRHMpohi7t-XV1w3prNiN4&e=>

The project started off as a standalone process and became a UIMA pipeline 
(outside of ctakes).
The plan had always been to port this to an optional ctakes module but we never 
got that fully implemented.

Some of the parts that need the most attention to get going:

  *   working with the ctakes type system
  *   pulling out weka (ML lib) for an asf 2.0 friendly lib instead
  *   simpler process for building the models.

Regarding knowledge, its good to be familiar with java, UIMA, decision trees, 
and ctakes. Likely in that order.

While this is still in the sandbox and you are still getting familiar with 
running it as a standalone app feel free to ping me and andy off-list if thats 
more convenient.
Then we can definitely bring it back to the dev list while getting it running 
in ctakes.

Cheers,

Britt









Britt Fitch
Wired Informatics
265 Franklin St Ste 1702
Boston, MA 02110
http://wiredinformatics.com<https://urldefense.proofpoint.com/v2/url?u=http-3A__wiredinformatics.com&d=BQMFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP&m=bElqtqimbSGc3Cbyxp1_oNRMbbZ9w7mScUEcNhes2TM&s=olOk8tiCOBwBEe2T2e5QjL3HkZhJMbV22Xdx-QqdYjA&e=>
[email protected]<mailto:[email protected]>

On Mar 20, 2015, at 7:57 PM, andy mcmurry 
<[email protected]<mailto:[email protected]>> wrote:

Britt et al: here is a student named rohit interested in getting the
deidentification pipeline running again. Hoping there is still interest in
getting this going in ctakes for real. Comments?
---------- Forwarded message ----------
From: "Rohit Shinde" 
<[email protected]<mailto:[email protected]>>
Date: Mar 20, 2015 5:02 AM
Subject: Re: Medical de-identification
To: "andy mcmurry" <[email protected]<mailto:[email protected]>>
Cc:

I would certainly be interested into "production grade code". The project
also sounds interesting. How do I start working on it? I know Java well.
What else would I need to know before starting on this project?

On Fri, Mar 20, 2015 at 12:44 PM, andy mcmurry 
<[email protected]<mailto:[email protected]>>
wrote:


Yes, the project is in Java, the code was written for a research project
and never made into "production grade code". If you are interested, we
would like to turn the scrubber into a solid pipeline. Java programming
100%, with Colt statistical library
On Mar 19, 2015 7:52 PM, "Rohit Shinde" 
<[email protected]<mailto:[email protected]>>
wrote:


Hi Andy,

Could you please tell me more about that project? I would really like a
reply.

Thank you,
Rohit Shinde

On Wed, Mar 18, 2015 at 5:51 PM, Rohit Shinde <
[email protected]<mailto:[email protected]>> wrote:


Hi Andy,

I am interested in medical de-identification. I would like to know what
this project consists of. Is it partially implemented, or does the
implementation need to start?

What languages would I need to know? What theoretical background would I
need? Also, how complex would this task be? What parts of OpenNLP does this
project use?

Thank you,
Rohit Shinde


Reply via email to