Re: Combining Knowledge- and Data-driven Methods for De-identification of Clinical Narratives

Azad Dehghan Wed, 20 Jan 2016 02:19:53 -0800

> I integrated your ruta scripts and added a new patch (includes and
> replaces my last one).
>
Ok.


> I noticed some semantic differences between the ruta rules and their
> jape originals, e.g., the brackets for the user name. Are they intended?
>
Brackets should be included. I was getting some inconsistent output so I
removed them for the time being.
> I needed to change some rule elements, e.g, "M.D." does not work as a
> literal rule element match (very old restriction of ruta which should be
> removed some day...). These literal string matches should be avoided at
> all if possible, or at least the start anchor should set to a different
> rule element.
>
OK.
> Ok, I'll let you know when I start with a rule set.

Great!

Azad
> Am 20.01.2016 um 01:47 schrieb Azad Dehghan:
> > Peter,
> >
> > So, we have Email, Url, Profession, Street, Zip, State and Username
> > completed so far.
> >
> > The following NERs remain:
> > Country, Age, Doctor, Fax, Id_num, Medicalrec_num, Patient, and Phone.
> >
> > I will do Country next. If you are able to translated the rest quickly
> > please do :) else just keep me posted which ones your are working on to
> > avoid duplicate work...and we can work through the remaining NERs.
> >
> > Also, once the NERs are translated I will prepare a number of examples
for
> > unit testing -- I will also be validate the NERs using the i2b2 research
> > dataset.
> >
> > Cheers,
> > Azad
> >
> > On 19 January 2016 at 09:01, Peter Klügl <peter.klu...@averbis.com>
wrote:
> >
> >> Ok, let me know which ones I should translate.
> >>
> >> Best,
> >>
> >> Peter
> >>
> >> Am 18.01.2016 um 20:13 schrieb Azad Dehghan:
> >>> Peter,
> >>>
> >>> Thanks for pushing things!
> >>>
> >>> I would rather split the rules/NERs to get things moving quicker (as I
> >> am a
> >>> newbie to Ruta). I will be uploading another NER (Username) shortly. I
> >> will
> >>> look at your changes to follow suit.
> >>>
> >>> Best,
> >>> Azad
> >>>
> >>> On 18 January 2016 at 14:06, Peter Klügl <peter.klu...@averbis.com>
> >> wrote:
> >>>> Hi,
> >>>>
> >>>> a new patch is attached.
> >>>>
> >>>> @Pei:
> >>>> are there suitable annotation types in the cTAKES type system? Some
> >>>> project in cTAKES uses something like OntologyMatch... I map it to
> >>>> IdentifiedAnnotation right now, but there are many empty features...
> >>>>
> >>>> @Azad:
> >>>> I changed the rules a bit, especially the capitalization like I use
it
> >>>> in ruta normally. The wordlist are compiled to a trie by the maven
> >>>> plugin. I also added the two regexes for url and email. I extended
the
> >>>> regex for the url. I also changed the evaluation order of some rules
> >>>> (with @). Feel free to add simple examples to examples.csv for the
unit
> >>>> tests.
> >>>>
> >>>> Let me know if you need more information about the changes.
> >>>>
> >>>> Do you wanna have help with the other rule sets? Or should we split
them
> >>>> up?
> >>>>
> >>>> Best,
> >>>>
> >>>> Peter
> >>>>
> >>>> Am 18.01.2016 um 11:04 schrieb Peter Klügl:
> >>>>> Hi,
> >>>>>
> >>>>> great. I will integrate them in the project and in the next patch.
> >>>>>
> >>>>> Best,
> >>>>>
> >>>>> Peter
> >>>>>
> >>>>> Am 18.01.2016 um 00:58 schrieb Azad Dehghan:
> >>>>>> Three NERs translated and uploaded.
> >>>>>>
> >>>>>> PS. I will validate all NERs once we have them all completed.
> >>>>>>
> >>>>>> Cheers,
> >>>>>> Azad
> >>>>>>
> >>>>>> On 24 November 2015 at 10:37, Azad Dehghan <azad.dehg...@gmail.com>
> >>>> wrote:
> >>>>>>> This is on my todo list for Dec. as well. If there are any more
> >>>> volunteers
> >>>>>>> for translating JAPE to RUTA, please get in touch.
> >>>>>>>
> >>>>>>> Cheers,
> >>>>>>> Azad
> >>>>>>>
> >>>>>>> On 24 Nov 2015 09:55, "Peter Klügl" <peter.klu...@averbis.com>
> >> wrote:
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> I just wanted to mention that I haven't forgot about it.
> >>>> Unfortunately,
> >>>>>>>> there is just no spare time right now. I hope I will be able to
> >>>> provide
> >>>>>>>> the patches in December.
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>>
> >>>>>>>> Peter
> >>>>>>>>
> >>>>>>>> Am 06.11.2015 um 16:40 schrieb Pei Chen:
> >>>>>>>>> Hi Peter,
> >>>>>>>>> I think the ctakes-examples is probably a good starting point at
> >>>> least
> >>>>>>>>> in terms of maven modules, etc.  I think it would be good if we
use
> >>>>>>>>> uimaFIT style as primary approach to wiring components together
and
> >>>>>>>>> generate desc's as secondary...
> >>>>>>>>> I think the actual components that would be required is probably
> >> best
> >>>>>>>>> left up to what is actually required for best performing c-deid.
> >> The
> >>>>>>>>> output would be interesting, I'm not sure if we should treat
this
> >> as
> >>>>>>>>> an independent preprocessing component or part of a pipeline (in
> >>>> which
> >>>>>>>>> case, we may need to propose a change to the type system or
perhaps
> >>>> an
> >>>>>>>>> alternative JCas view.  You can probably open up that
discussion to
> >>>>>>>>> the dev group as you see fit.)
> >>>>>>>>>
> >>>>>>>>> My 2 cents...
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Fri, Nov 6, 2015 at 3:38 AM, Peter Klügl <
> >>>> peter.klu...@averbis.com>
> >>>>>>> wrote:
> >>>>>>>>>> Hi,
> >>>>>>>>>>
> >>>>>>>>>> Is there a cTAKES project that may serve as an example on how
the
> >>>>>>> cTAKES
> >>>>>>>>>> community develops or how a project should look like?
> >>>>>>>>>> I learned that different people set up UIMA project in a quite
> >>>>>>> different
> >>>>>>>>>> manner and I do not what to get inspired by "some sort of
> >> out-dated"
> >>>>>>>>>> approach in the cTAKES repo.
> >>>>>>>>>>
> >>>>>>>>>> Are there restriction or preferences about the preprocessing
> >>>>>>> components
> >>>>>>>>>> that should be used and the kind of "output" of the project.
> >>>>>>>>>> Components: On which components may the componetns rely:
> >> tokenizer,
> >>>>>>> ...
> >>>>>>>>>> parser, ... dict lookup?
> >>>>>>>>>> "output": Should the project provide a pipeline or a single AE?
> >>>>>>>>>>
> >>>>>>>>>> More comments below.
> >>>>>>>>>>
> >>>>>>>>>> Am 03.11.2015 um 16:54 schrieb Azad Dehghan:
> >>>>>>>>>>>> Who else plans to provide patches for it? Just to avoid
> >> duplicate
> >>>>>>> work
> >>>>>>>>>>>> and to coordnate the efforts ...
> >>>>>>>>>>>>
> >>>>>>>>>>> I would like to help with the translating JAPE to RUTA.
> >>>>>>>>>> You can already go ahead with the UIMA Ruta Workbench if you
want,
> >>>> or
> >>>>>>>>>> wait until I set up the project with ruta integration.
> >>>>>>>>>>
> >>>>>>>>>> If any questions arise, just ask :-)
> >>>>>>>>>>
> >>>>>>>>>>>> Is there a development dataset which was utilized for the
> >> initial
> >>>>>>>>>>>> development, and if yes, is it possible to contribute it too?
> >>>>>>>>>>>>
> >>>>>>>>>>> The data set is unfortunately not publicly available; i2b2
> >>>>>>>>>>> <https://www.i2b2.org/NLP/DataSets/Main.php> typically
releases
> >>>> the
> >>>>>>> data
> >>>>>>>>>>> sets 12 months after a given challenge; this is done on an
> >>>>>>> individual basis
> >>>>>>>>>>> and involve a Data Use Agreement.
> >>>>>>>>>>>
> >>>>>>>>>>> However, I will be able to conduct and coordinate the
validation.
> >>>>>>>>>>>
> >>>>>>>>>> Ok, I'll investigate if we have already access to the dataset
> >> here.
> >>>>>>>>>>
> >>>>>>>>>>>> My first step would be:
> >>>>>>>>>>>> - set up a maven project
> >>>>>>>>>>>> - set up a development pipeline in a test (with cTAKES
> >> components
> >>>>>>>>>>>> replacing the previous ANNIE preprocessing)
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> But one item that we need to review is the 3rd party libs
jars
> >>>> that
> >>>>>>>>>>>> were included to ensure compatibility.  I’ll be sure to take
a
> >>>> look
> >>>>>>> at
> >>>>>>>>>>>> that over the next few weeks.
> >>>>>>>>>>>>
> >>>>>>>>>>>> —Pei
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>> @Pei - once ANNIE components are replaced there is should not
be
> >> a
> >>>>>>> need to
> >>>>>>>>>>> worry about the 3rd party libs.
> >>>>>>>>>>>
> >>>>>>>>>>> Also, just a thought: we may want to create an independent
> >>>> component
> >>>>>>> for
> >>>>>>>>>>> the Two Pass recognition (TwoPass.java) as this method have
shown
> >>>>>>> useful
> >>>>>>>>>>> for general NER on longitudinal data and surely useful
> >> independent
> >>>>>>> of the
> >>>>>>>>>>> deid component.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Cheers,
> >>>>>>>>>>> Azad
> >>>>>>>>>>>
> >>
>

Re: Combining Knowledge- and Data-driven Methods for De-identification of Clinical Narratives

Reply via email to