Re: Combining Knowledge- and Data-driven Methods for De-identification of Clinical Narratives

Pei Chen Wed, 20 Jan 2016 10:36:06 -0800

Hi,
Sorry I was swamped recently.
But yeah, we can even create an extended type system to store these items 
temporarily and add them into the main/core type system afterwards.
There was an existing item to upgrade UIMA, but agreed- it will require much 
more testing.  If it works, we can upgrade it in our sandbox area or create a 
branch if necessary.


—Pei

> On Jan 18, 2016, at 9:06 AM, Peter Klügl <[email protected]> wrote:
> 
> Hi,
> 
> a new patch is attached.
> 
> @Pei:
> are there suitable annotation types in the cTAKES type system? Some
> project in cTAKES uses something like OntologyMatch... I map it to
> IdentifiedAnnotation right now, but there are many empty features...
> 
> @Azad:
> I changed the rules a bit, especially the capitalization like I use it
> in ruta normally. The wordlist are compiled to a trie by the maven
> plugin. I also added the two regexes for url and email. I extended the
> regex for the url. I also changed the evaluation order of some rules
> (with @). Feel free to add simple examples to examples.csv for the unit
> tests.
> 
> Let me know if you need more information about the changes.
> 
> Do you wanna have help with the other rule sets? Or should we split them up?
> 
> Best,
> 
> Peter
> 
> Am 18.01.2016 um 11:04 schrieb Peter Klügl:
>> Hi,
>> 
>> great. I will integrate them in the project and in the next patch.
>> 
>> Best,
>> 
>> Peter
>> 
>> Am 18.01.2016 um 00:58 schrieb Azad Dehghan:
>>> Three NERs translated and uploaded.
>>> 
>>> PS. I will validate all NERs once we have them all completed.
>>> 
>>> Cheers,
>>> Azad
>>> 
>>> On 24 November 2015 at 10:37, Azad Dehghan <[email protected]> wrote:
>>> 
>>>> This is on my todo list for Dec. as well. If there are any more volunteers
>>>> for translating JAPE to RUTA, please get in touch.
>>>> 
>>>> Cheers,
>>>> Azad
>>>> 
>>>> On 24 Nov 2015 09:55, "Peter Klügl" <[email protected]> wrote:
>>>>> Hi,
>>>>> 
>>>>> I just wanted to mention that I haven't forgot about it. Unfortunately,
>>>>> there is just no spare time right now. I hope I will be able to provide
>>>>> the patches in December.
>>>>> 
>>>>> Best,
>>>>> 
>>>>> Peter
>>>>> 
>>>>> Am 06.11.2015 um 16:40 schrieb Pei Chen:
>>>>>> Hi Peter,
>>>>>> I think the ctakes-examples is probably a good starting point at least
>>>>>> in terms of maven modules, etc.  I think it would be good if we use
>>>>>> uimaFIT style as primary approach to wiring components together and
>>>>>> generate desc's as secondary...
>>>>>> I think the actual components that would be required is probably best
>>>>>> left up to what is actually required for best performing c-deid.  The
>>>>>> output would be interesting, I'm not sure if we should treat this as
>>>>>> an independent preprocessing component or part of a pipeline (in which
>>>>>> case, we may need to propose a change to the type system or perhaps an
>>>>>> alternative JCas view.  You can probably open up that discussion to
>>>>>> the dev group as you see fit.)
>>>>>> 
>>>>>> My 2 cents...
>>>>>> 
>>>>>> 
>>>>>> On Fri, Nov 6, 2015 at 3:38 AM, Peter Klügl <[email protected]>
>>>> wrote:
>>>>>>> Hi,
>>>>>>> 
>>>>>>> Is there a cTAKES project that may serve as an example on how the
>>>> cTAKES
>>>>>>> community develops or how a project should look like?
>>>>>>> I learned that different people set up UIMA project in a quite
>>>> different
>>>>>>> manner and I do not what to get inspired by "some sort of out-dated"
>>>>>>> approach in the cTAKES repo.
>>>>>>> 
>>>>>>> Are there restriction or preferences about the preprocessing
>>>> components
>>>>>>> that should be used and the kind of "output" of the project.
>>>>>>> Components: On which components may the componetns rely: tokenizer,
>>>> ...
>>>>>>> parser, ... dict lookup?
>>>>>>> "output": Should the project provide a pipeline or a single AE?
>>>>>>> 
>>>>>>> More comments below.
>>>>>>> 
>>>>>>> Am 03.11.2015 um 16:54 schrieb Azad Dehghan:
>>>>>>>>> Who else plans to provide patches for it? Just to avoid duplicate
>>>> work
>>>>>>>>> and to coordnate the efforts ...
>>>>>>>>> 
>>>>>>>> I would like to help with the translating JAPE to RUTA.
>>>>>>> You can already go ahead with the UIMA Ruta Workbench if you want, or
>>>>>>> wait until I set up the project with ruta integration.
>>>>>>> 
>>>>>>> If any questions arise, just ask :-)
>>>>>>> 
>>>>>>>>> Is there a development dataset which was utilized for the initial
>>>>>>>>> development, and if yes, is it possible to contribute it too?
>>>>>>>>> 
>>>>>>>> The data set is unfortunately not publicly available; i2b2
>>>>>>>> <https://www.i2b2.org/NLP/DataSets/Main.php> typically releases the
>>>> data
>>>>>>>> sets 12 months after a given challenge; this is done on an
>>>> individual basis
>>>>>>>> and involve a Data Use Agreement.
>>>>>>>> 
>>>>>>>> However, I will be able to conduct and coordinate the validation.
>>>>>>>> 
>>>>>>> Ok, I'll investigate if we have already access to the dataset here.
>>>>>>> 
>>>>>>> 
>>>>>>>>> My first step would be:
>>>>>>>>> - set up a maven project
>>>>>>>>> - set up a development pipeline in a test (with cTAKES components
>>>>>>>>> replacing the previous ANNIE preprocessing)
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> But one item that we need to review is the 3rd party libs jars that
>>>>>>>>> were included to ensure compatibility.  I’ll be sure to take a look
>>>> at
>>>>>>>>> that over the next few weeks.
>>>>>>>>> 
>>>>>>>>> —Pei
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> @Pei - once ANNIE components are replaced there is should not be a
>>>> need to
>>>>>>>> worry about the 3rd party libs.
>>>>>>>> 
>>>>>>>> Also, just a thought: we may want to create an independent component
>>>> for
>>>>>>>> the Two Pass recognition (TwoPass.java) as this method have shown
>>>> useful
>>>>>>>> for general NER on longitudinal data and surely useful independent
>>>> of the
>>>>>>>> deid component.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Cheers,
>>>>>>>> Azad
>>>>>>>> 
>

signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: Combining Knowledge- and Data-driven Methods for De-identification of Clinical Narratives

Reply via email to