Hi Jeff, I think that sentence splitting is possibly a cause for this behavior and is worth checking.
You can get some quick debug output by adding a writer to the end of your pipeline. add pretty.plaintext.PrettyTextWriterFit SubDirectory=POS The SubDirectory= parameter is optional. This writer creates a file that (in part) lists output sentence -by- sentence. So you should be able to see how the sentence splitter is behaving in each circumstance. If it is the Sentence Splitter then you could try using a different lookup window in the dictionary lookup and see if your results improve or get worse. In the piper file, just insert above the Dictionary lookup addition set windowAnnotations=Section or set windowAnnotations=Paragraph if you are using a paragraph parser. Sean ________________________________________ From: Jeffrey Miller <jeff...@gmail.com> Sent: Wednesday, February 5, 2020 12:24 PM To: dev@ctakes.apache.org Subject: DefaultJCasTermAnnotator behavior with period and semicolon in UMLS terms [EXTERNAL] * External Email - Caution * Hi, I've noticed that if a term contains a period or a semicolon, as an example, from the sno_rx_16ab dictionary, "antibody ; toxoplasma", that this will not be found if the semicolon is attached to the first word, but will be found if it is either "antibody ; toxoplasma" or "antibody ;toxoplasma". There is similar behavior with a period in the same place. My first instinct was that this had to do with the sentence splitter and sentences being the default lookup window. I found an older discussion about this in reference to periods in genes, but it was from a while back. Just curious if anyone has dealt with this issue. Thanks, Jeff