The changes to assertion and dependency parser needed to support multiline sentences are in the ytex branch. another pair of eyes and more testing is always welcome
On Friday, January 17, 2014, digital paula <[email protected]> wrote: > > > > Hello again cTAKES Community, I thought that adding the sentence > splitter(w/newline-sentence-continuation-recognition) would have been as > simple as it was adding the sectionizer annotator to the eclipse > environment. I see per VJ's note that it's not that simple, my > understanding is that the standard clinical pipeline requires the assertion > and dependency parsers. I've explored a bit of the changes needed and at > least for Assertion looks like SentenceDetector, SentenceSpan, likely the > SingleDocumentProcessor from the MITRE jar will need to be modified to > recognize multi-line sentences. This is so the assertion and dependency > parsers can be kept in the pipeline. I would love to devote the time > needed to fix the sentence split to recognize sentences that are multiline > but I need to focus on hacking my way through the cue word issue because > I've been left in the lurch with no response to my posts :-((((( > Regards, > Paula > > > Date: Wed, 15 Jan 2014 14:53:17 -0500 > > Subject: Re: sentence splitter & forks/branches > > From: [email protected] <javascript:;> > > To: [email protected] <javascript:;> > > > > It is unfortunately not that trivial, as allowing newlines within > sentences > > requires changes to the assertion and dependency parser modules. > > > > If you're not using those AEs you could theoretically build the ytex > > branch, and just add ctakes-ytex-uima.jar and > > ctakes-ytex-uima\desc\analysis_engine\SentenceDetectorAnnotator.xml to > your > > exsting ctakes install (haven't tried it, but it should work). > > > > -vj > > > > > > On Wed, Jan 15, 2014 at 1:57 PM, Lingren, Todd <[email protected] > >wrote: > > > > > I have a general question about forks, specifically the YTEX branch > that > > > Vijay mentions. > > > If I wanted to implement just the sentence splitter from YTEX into a > > > currently existing 3.1 install, how would I do that? Is it possible? > Or do > > > I have to switch over completely to run from YTEX branch? > > > > > > Todd Lingren > > > Biomedical Informatics > > > Cincinnati Children's Hospital > > > [email protected] > > > 513-803-9032 > > > > > > > > > -----Original Message----- > > > From: vijay garla [mailto:[email protected]] > > > Sent: Wednesday, January 15, 2014 11:34 AM > > > To: [email protected] > > > Subject: Re: svn commit: r1551805 - > > > > /ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes/assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakesImpl.java > > > > > > The issue is indeed the sentence splitter - negation is limited to > words > > > within the sentence, and if newlines are considered sentence > boundaries, it > > > doesn't work properly (splitting on newlines breaks many other things > as > > > well). The YTEX branch includes a sentence splitter that does not > > > automatically split sentences on newlines. > > > > > > best, > > > > > > vj > > > > > > > > > On Wed, Jan 15, 2014 at 10:03 AM, Masanz, James J. < > [email protected] > > > >wrote: > > > > > > > Hi Paula, > > > > > > > > The sentence detector in 3.1.0 and 3.1.1 (and previous releases) > > > > assumes sentences don't cross line boundaries. > > > > OpenNLP is used to find sentence breaks, but then if newlines are > > > > found, those are also set (within cTAKES, not OpenNLP) to be sentence > > > breaks. > > > > > > > > (just FYI I haven't had a chance to look at the ytex branch, which > the > > > > subject commit is about) > > > > > > > > -- James > > > > > > > > -----Original Message----- > > > > From: [email protected][mailto: > > > > [email protected]] On Behalf > Of > > > > digital paula > > > > Sent: Tuesday, January 14, 2014 10:25 PM > > > > To: [email protected] > > > > Subject: RE: svn commit: r1551805 - > > > > > /ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes > > > > > /assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakes > > > > Impl.java > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hello cTAKES Developer Community, > > > > I'm a little behind on reading posts....this one is from last month. > > > > I think this issue is already addressed in current release? I'm still > > > > running the previous release...3.1.0. > > > > I just noticed something interesting, the negation didn't take when > it > > > > is on a different line. I just removed all carriage returns from > > > narratives > > > > and negation picked it up as long as it's treated as one long string. > > > To >
