RE: Allergy Annotator

Finan, Sean Wed, 07 Dec 2016 12:50:07 -0800

Hi Sean,

Even with a change to your sentence detection you may need to change your 
negation annotator.


As a quick change, you can add an annotator to deal specifically with your 
situation.  It can be simpler or more elaborate, but something like this:


   static private final Pattern NEGATIVE_PATTERN
         = Pattern.compile( "(?:\\s?:\\s*)(?:NEGATIVE|(?:NO\\.?\\b)|NONE|(?:NOT 
(?:SEEN|PRESENT|INDICATED|FOUND|DISCOVERED)?))",
         Pattern.CASE_INSENSITIVE );

   /**
    * Finds list-style negations
    * {@inheritDoc}
    */
   @Override
   public void process( final JCas jcas ) throws AnalysisEngineProcessException 
{
      LOGGER.info( "Starting Processing" );
      final Collection<DiseaseDisorderMention> diseases = JCasUtil.select( 
jcas, DiseaseDisorderMention.class );
      if ( !diseases.isEmpty() ) {
         processType( jcas, diseases );
      }
      final Collection<SignSymptomMention> findings = JCasUtil.select( jcas, 
SignSymptomMention.class );
      if ( !findings.isEmpty() ) {
         processType( jcas, findings );
      }
      LOGGER.info( "Finished Processing" );
   }

   static private void processType( final JCas jcas, final Collection<? extends 
IdentifiedAnnotation> annotations ) {
      final String docText = jcas.getDocumentText();
      for ( IdentifiedAnnotation annotation : annotations ) {
         String window;
         final int annotationEnd = annotation.getEnd();
         final int maxEnd = Math.min( docText.length(), annotationEnd + 60 );
         final List<Sentence> covering = JCasUtil.selectCovering( jcas, 
Sentence.class, annotation );
         if ( covering == null || covering.isEmpty() ) {
            LOGGER.warn( "Identified Annotation spans not within a Sentence : " 
+ annotation.getCoveredText() );
            window = docText.substring( annotationEnd, maxEnd );
         } else if ( covering.size() > 1 ) {
            LOGGER.warn( DocumentIDAnnotationUtil.getDocumentID( jcas ) );
            LOGGER.warn( "Identified Annotation spans " + covering.size() + " 
Sentences : " + annotation.getCoveredText() );
            final int sentencesEnd = covering.stream().mapToInt( 
Sentence::getEnd ).max().orElse( maxEnd );
            window = docText.substring( annotationEnd, sentencesEnd );
//            covering.stream().map( Sentence::getCoveredText ).forEach( 
LOGGER::warn );
         } else {
            window = docText.substring( annotationEnd, covering.get( 0 
).getEnd() );
         }
         final Matcher matcher = NEGATIVE_PATTERN.matcher( window );
         if ( matcher.find() ) {
            annotation.setPolarity( CONST.NE_POLARITY_NEGATION_PRESENT );
         }
      }
   }

-----Original Message-----
From: Mullane, Sean *HS [mailto:sp...@hscmail.mcc.virginia.edu] 
Sent: Wednesday, December 07, 2016 3:30 PM
To: 'Tomasz Oliwa'
Cc: 'dev@ctakes.apache.org'
Subject: RE: Allergy Annotator

I'm reviving this thread with reference to negation detection. I previously 
posted about this to the User list but this is probably a more appropriate 
venue.

The way the sentences are split on ":" makes the negation annotator miss 
negation in lists of this form:

Hyperlipidemia:  Yes
Hypercholesterolemia:  No
Chronic Renal Insufficiency:  N/A

I tried reversing order and removing ":"s and found that the negation for 
Hypercholesterolemia is detected when in this form:

Yes Hyperlipidemia
No Hypercholesterolemia
N/A Chronic Renal Insufficiency

Our notes have quite a few places with this sort of list where good negation 
detection is important but I haven't very good results. The sentence 
segmentator sees this as 12 separate sentences, but I would think proper 
behavior would be to consider this as 6 sentences (breaking sentences on line 
break but not on colons). I see previous discussion on the list about the 
sentence segmentator breaking on newlines but little regarding colons. I would 
think in most cases it would be more useful not to break on ":". Or is there an 
overriding reason for the current behavior?
If changing the sentence segmentator isn't an option is there a different way 
to configure the negation detection annotator that would avoid this issue?

Thanks,
Sean



Hi,

I am interested in the design decision of the sentence detector.

Why does it split a sentence of the form "WORD1: WORD2 WORD3." into two 
sentences "WORD1:" and "WORD2 WORD3."? Do other components of cTAKES require 
such a sentence splitting?

It would seem to me that it should remain one sentence. For example, the 
smoking status detector has its own SentenceAdjuster that merges some of such 
sentences back into one, because of this design.

Thanks, Tomasz

________________________________________ From: Finan, Sean 
[sean...@childrens.harvard.edu] Sent: Friday, July 10, 2015 3:20 PM To: 
de...@ctakes.apache.org Subject: RE: Allergy Annotator

Hi Tom,

It is exactly because the sentence detector splits "KEY:" from "VALUE" that I 
didn't suggest using sentences. Instead, I would just iterate over the whole 
cas collection of medication events and attempt to match allergy phrases 
("allergic to medication") with text the note spanning from event.begin-15 to
event.end+15 or whatever window size you prefer.

Sean

-----Original Message----- From: Tom Devel [mailto:deve...@gmail.com] Sent: 
Friday, July 10, 2015 4:12 PM To: de...@ctakes.apache.org Subject: Re: Allergy 
Annotator

Sean and Dima, these are great suggestions, thanks so far.

Sean, when looping over medication events as you say, I can see how it is 
possible to take the textspan.Sentence of this MedicationMention, and then do a 
regex check for the phrase structure as Dima said.

But instead of textspan.Sentence, you mention "see any is included in a phrase".
What cTAKES/UIMA class is related to this?

Because if I would use textspan.Sentence, it would work for "The patient is 
allergic to penicillin.", but cTAKES splits "ALLERGIES: PENICILLIN, WHEAT" into 
two sentences, so that the MedicationMentions here would not be in the same 
sentence as the word "ALLERGIES".

Thanks again, Tom

On Fri, Jul 10, 2015 at 2:12 PM, Finan, Sean < sean...@childrens.harvard.edu>
wrote:

Hi Dima, Tom,

I was thinking the same as Dima's first solution. Iterate through the 
medication events and see any is included in a phrase as mentioned in Tom's 
original email. Each phrase structure would have to be specified beforehand. 
However, assigning appropriate CUIs would require having a lookup table for 
each medication allergy. I think that would be the simplest solution.

Sean

-----Original Message----- From: Dligach, Dmitriy 
[mailto:dmit...@childrens.harvard.edu] Sent: Friday, July 10, 2015 2:50 PM To: 
cTAKES Developer list Subject: Re: Allergy Annotator

Hi Tom,

If the patters are pretty simple, you could just add a few rules on top of the 
cTAKES dictionary lookup output. Something of the kind "allergic to 
<medication>" or "allergies: <medication1>, <medication2>, <substance1>, ...".

If these patterns are hard to express as rules, you should consider a machine 
learning based sequence labeling route (e.g. something similar to the cTAKES 
chunker).

Dima

-- Dmitriy (Dima) Dligach, Ph.D. Boston Children's Hospital and Harvard Medical 
School (617) 651-0397

On Jul 10, 2015, at 13:40, Tom Devel <deve...@gmail.com<mailto: 
deve...@gmail.com>> wrote:

Sean,

It would be a wider net, such that if an allergy is mentioned in the clinical 
note, this is captured in the corresponding IdentifiedAnnotation (or 
alternatively, if the IdentifiedAnnotation class should not be changed with a 
new attribute, in a separate allergy annotation).

This annotator would then have to of course run after the clinical pipeline has 
run and discovered all IdentifiedAnnotations.

I am familiar with writing UIMA/cTAKES annotators, but not sure how a new ML 
method could be integrated here for detecting allergies. Do you have any 
thoughts about how to approach this in general?

Thanks, Tom

On Fri, Jul 10, 2015 at 11:54 AM, Finan, Sean < 
sean...@childrens.harvard.edu<mailto:Sean.Finan@childrens.harvard.e du>> wrote:

Hi Tom,

Are you interested in catching all allergies or just a few specific allergies 
for a study? If you are only concerned with a few then there is a (possibly) 
simple solution. If you are interested in throwing a wider net then I think 
that a new module would need to be created; does anybody reading this have an 
ML or regex style module?

Sean

-----Original Message----- From: Tom Devel [mailto:deve...@gmail.com] Sent: 
Friday, July 10, 2015 12:42 PM To: 
de...@ctakes.apache.org<mailto:de...@ctakes.apache.org> Subject: Allergy 
Annotator

Hi,

I would like to use/extend cTAKES to detect allergies.

In the cTAKES publication (2010)

https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ncbi.nlm.nih.g 
ov_pmc_articles_PMC2995668_&d=BQIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZM 
SdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=ZApJmGKjz 
vFfNco5rRFVwSIyxmg4MRsxakfuXHbMZME&s=mGWu0XBCJqG2MI5qPlwIpGbQL5IYe7t5E 
WcvhPYW7Lo&e= there is the mention that: "Allergies to a given medication are 
handled by setting the negation attribute of that medication to 'is negated'."

However, in a post here in 2014 (RE: Allergy Indication) it is said that cTAKES 
does not have a module for allergy discovery.

1. What is the current status of allergy detection in cTAKES?

2. I did some testing, while cTAKES discovers concepts about allegies ("wheat 
allergy" is found as C0949570), using "ALLERGIES: PENICILLIN, WHEAT" or "The 
patient is allergic to penicillin." does not give penicillin or wheat 
annotations allergy status.

How would I go about detecting these allergy mentions?

Thanks, Tom

RE: Allergy Annotator

Reply via email to