Just fyi, a lot of things start to get IN pos with the different breaks.  For 
that reason we removed the exclusion of IN from the dictionary lookup in 
another project using the bio detector:

// This is the same as the default list except that "IN" is not excluded
set 
exclusionTags="VB,VBD,VBG,VBN,VBP,VBZ,CC,CD,DT,EX,LS,MD,PDT,POS,PP,PP$,PRP,PRP$,RP,TO,WDT,WP,WPS,WRB"

If things still go missing you can just not exclude any pos from lookup - which 
is what I do in yet another project.

Sean


________________________________________
From: Tomasz Oliwa <ol...@uchicago.edu>
Sent: Tuesday, March 13, 2018 6:14 PM
To: dev@ctakes.apache.org
Subject: Re: Sentence splitter [EXTERNAL]

Interesting, with the SentenceDetectorAnnotatorBIO the WordToken "aspirin" gets 
partOfSpeech = "IN", with the regular SentenceDetectorAnnotator it is "NN".

Looks like you were right Tim, since IN stands for preposition or subordinating 
conjunction as defined at 
https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ling.upenn.edu_courses_Fall-5F2003_ling001_penn-5Ftreebank-5Fpos.html&d=DwIF-g&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=Q5JhdPhBsKD7UM5afTxmQ6lmFQzj0gmPCyFcefaEoRQ&s=_HkxQUxlBtVxn79KEjc8GFOT4w6qba_BBJXlkMjmLpI&e=

Tomasz

________________________________________
From: Miller, Timothy <timothy.mil...@childrens.harvard.edu>
Sent: Tuesday, March 13, 2018 4:57:36 PM
To: dev@ctakes.apache.org
Subject: Re: Sentence splitter [EXTERNAL]

That sounds bizarre! I can think of two possibilities: a sentence break in the 
middle of the word (unlikely), or the different sentence splits caused the POS 
tagger some confusion, and tagged the word aspirin as a forbidden part of 
speech, like a preposition or something. If you check the token annotation on 
the word aspirin you should be able to see the part of speech tag for that word.
Tim

________________________________________
From: Tomasz Oliwa <ol...@uchicago.edu>
Sent: Tuesday, March 13, 2018 5:34 PM
To: dev@ctakes.apache.org
Subject: Re: Sentence splitter [EXTERNAL]

Hi,

I tested SentenceDetectorAnnotatorBIO in cTAKES 4.0.0, simply by replacing 
SentenceDetectorAnnotator.xml with SentenceDetectorAnnotatorBIO.xml in 
AggregatePlaintextFastUMLSProcessor.xml.

While it seemed to work, I noticed that in one example, an IdentifiedAnnotation 
was not found, that was found for the same input with just 
SentenceDetectorAnnotator.xml.

Could somebody check this please? Run the cTAKES CVD with the following input 
(without the "):

"
aspirin

his leg
"

On the machine I tested this, the MedicationMention does not show up with 
SentenceDetectorAnnotatorBIO, but it does with SentenceDetectorAnnotator.

________________________________________
From: Masoud Rouhizadeh <m...@jhu.edu>
Sent: Tuesday, March 13, 2018 3:02:35 PM
To: dev@ctakes.apache.org
Subject: Re: Sentence splitter [EXTERNAL]

Hi Sean,

Thank you for the pointer. I was able to run the SentenceDetectorAnnotatorBIO 
from ctakes-core. The results are way better than the SentenceDetectorAnnotator 
but I still see some issues such as splitting “Dr.” as a separate sentence 
(most likely due to the period after the abbreviation). Do you think there is a 
way to define an abbreviation list for SentenceDetectorAnnotatorBIO so that it 
knows that this is a word-final (i.e. abbreviation-final) and not a 
sentence-final period?

Thanks again,
Masoud





On 3/9/18, 5:35 PM, "Finan, Sean" <sean.fi...@childrens.harvard.edu> wrote:


    Hi Masoud,

    There is a very nice SentenceDetectorBIO in ctakes-core.  It will split 
sentences based upon features other than just a newline character, which 
appears to be what you want.

    Sean


    ________________________________________
    From: Masoud Rouhizadeh <m...@jhu.edu>
    Sent: Friday, March 9, 2018 4:41 PM
    To: dev@ctakes.apache.org
    Subject: Sentence splitter [EXTERNAL]

    Hello cTAKES team!



    I was wondering what types of sentence splitters are available in cTAKES? 
The default sentence splitter does not appear to be the best one. See output 
for the demo example from the example in cTAKES installation guide:



    Dr. Nutritious Medical Nutrition Therapy for Hyperlipidemia Referral from:

    Julie Tester, RD, LD, CNSD Phone contact:

    (555)

    555-1212 Height:

    144 cm Current Weight:

    45 kg Date of current weight: 02-29-2001 Admit Weight:

    [...]



    Thanks so much,

    Masoud





    ----

    Masoud Rouhizadeh, PhD

    NLP Specialist / Software Engineer

    Institute for Clinical and Translational Research

    Johns Hopkins University

    
https://urldefense.proofpoint.com/v2/url?u=http-3A__pages.jh.edu_-7Emrouhiz1&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=aZ4yDE4zQbRJuUQ8p-T5nPrjhYvXF28sFoJWEtP3sGU&s=ob0U2sSfS7UijTI8PqCh_MwMucxPc14ovmcC2vq7rDA&e=








Reply via email to