Sean, Thanks for the detailed answer- I will take a look and update this thread if I find out the cause.
Jeff On Thu, Feb 6, 2020 at 9:13 AM Finan, Sean <sean.fi...@childrens.harvard.edu> wrote: > Hi Jeff, > > I think that sentence splitting is possibly a cause for this behavior and > is worth checking. > > You can get some quick debug output by adding a writer to the end of your > pipeline. > > add pretty.plaintext.PrettyTextWriterFit SubDirectory=POS > > The SubDirectory= parameter is optional. > This writer creates a file that (in part) lists output sentence -by- > sentence. So you should be able to see how the sentence splitter is > behaving in each circumstance. > > If it is the Sentence Splitter then you could try using a different lookup > window in the dictionary lookup and see if your results improve or get > worse. In the piper file, just insert above the Dictionary lookup addition > > set windowAnnotations=Section > > or > set windowAnnotations=Paragraph > if you are using a paragraph parser. > > Sean > > > ________________________________________ > From: Jeffrey Miller <jeff...@gmail.com> > Sent: Wednesday, February 5, 2020 12:24 PM > To: dev@ctakes.apache.org > Subject: DefaultJCasTermAnnotator behavior with period and semicolon in > UMLS terms [EXTERNAL] > > * External Email - Caution * > > > Hi, > > I've noticed that if a term contains a period or a semicolon, as an > example, from the sno_rx_16ab dictionary, "antibody ; toxoplasma", that > this will not be found if the semicolon is attached to the first word, but > will be found if it is either "antibody ; toxoplasma" or "antibody > ;toxoplasma". There is similar behavior with a period in the same place. My > first instinct was that this had to do with the sentence splitter and > sentences being the default lookup window. I found an older discussion > about this in reference to periods in genes, but it was from a while back. > Just curious if anyone has dealt with this issue. > > Thanks, > Jeff >