Thank you so much Tim for the prompt response. I appreciate the additional
info and suggestions that you had provided. Yes, I see that it is the i2b2
challenge 2010 dataset.
Can I just ask what is the machine learning algorithm that is being used?
Thanks.
Regards,
Paula
> From: [email protected]
> To: [email protected]
> Subject: Re: sentence splitter & forks/branches
> Date: Sat, 18 Jan 2014 13:02:06 +0000
>
> Sorry Paula, it's been a busy few weeks. I'm sure everyone else has been
> busy as well.
>
> I'm sorry to say I think at this point it might be difficult to get the
> exact fix you want out of the module. It works in 2 parts I believe:
> 1) Identify cue words
> 2) Classify entities given the identified cue words.
>
> And you fixed 1) to recognize your cue word, but if 2) uses a machine
> learning model it may not get the right outcome sometimes and that can
> be hard to fix. It obviously wouldn't have seen any examples using that
> keyword, though I might've thought that there might be some cases it
> would get right using other features.
>
> If you've tried a bunch of different examples and it seems like it can't
> get any of them right with new cue words, then there are a few things
> you might consider as next steps:
>
> 1) Write your own rule-based analysis engine to follow the existing
> assertion module and use some simple algorithm to link your cue words
> with nearby entities.
> 2) Acquire training data and try to re-train the assertion module with
> your cue word additions. I believe they used the i2b2 challenge 2010
> concept assertion dataset which is available with a data use agreement.
>
> Hope this helps,
> Tim
>
>
>
> On 01/17/2014 10:46 PM, digital paula wrote:
> >
> >
> > Hello again cTAKES Community, I thought that adding the sentence
> > splitter(w/newline-sentence-continuation-recognition) would have been as
> > simple as it was adding the sectionizer annotator to the eclipse
> > environment. I see per VJ's note that it's not that simple, my
> > understanding is that the standard clinical pipeline requires the assertion
> > and dependency parsers. I've explored a bit of the changes needed and at
> > least for Assertion looks like SentenceDetector, SentenceSpan, likely the
> > SingleDocumentProcessor from the MITRE jar will need to be modified to
> > recognize multi-line sentences. This is so the assertion and dependency
> > parsers can be kept in the pipeline. I would love to devote the time
> > needed to fix the sentence split to recognize sentences that are multiline
> > but I need to focus on hacking my way through the cue word issue because
> > I've been left in the lurch with no response to my posts :-(((((
> > Regards,
> > Paula
> >
> >> Date: Wed, 15 Jan 2014 14:53:17 -0500
> >> Subject: Re: sentence splitter & forks/branches
> >> From: [email protected]
> >> To: [email protected]
> >>
> >> It is unfortunately not that trivial, as allowing newlines within sentences
> >> requires changes to the assertion and dependency parser modules.
> >>
> >> If you're not using those AEs you could theoretically build the ytex
> >> branch, and just add ctakes-ytex-uima.jar and
> >> ctakes-ytex-uima\desc\analysis_engine\SentenceDetectorAnnotator.xml to your
> >> exsting ctakes install (haven't tried it, but it should work).
> >>
> >> -vj
> >>
> >>
> >> On Wed, Jan 15, 2014 at 1:57 PM, Lingren, Todd
> >> <[email protected]>wrote:
> >>
> >>> I have a general question about forks, specifically the YTEX branch that
> >>> Vijay mentions.
> >>> If I wanted to implement just the sentence splitter from YTEX into a
> >>> currently existing 3.1 install, how would I do that? Is it possible? Or do
> >>> I have to switch over completely to run from YTEX branch?
> >>>
> >>> Todd Lingren
> >>> Biomedical Informatics
> >>> Cincinnati Children's Hospital
> >>> [email protected]
> >>> 513-803-9032
> >>>
> >>>
> >>> -----Original Message-----
> >>> From: vijay garla [mailto:[email protected]]
> >>> Sent: Wednesday, January 15, 2014 11:34 AM
> >>> To: [email protected]
> >>> Subject: Re: svn commit: r1551805 -
> >>> /ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes/assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakesImpl.java
> >>>
> >>> The issue is indeed the sentence splitter - negation is limited to words
> >>> within the sentence, and if newlines are considered sentence boundaries,
> >>> it
> >>> doesn't work properly (splitting on newlines breaks many other things as
> >>> well). The YTEX branch includes a sentence splitter that does not
> >>> automatically split sentences on newlines.
> >>>
> >>> best,
> >>>
> >>> vj
> >>>
> >>>
> >>> On Wed, Jan 15, 2014 at 10:03 AM, Masanz, James J. <[email protected]
> >>>> wrote:
> >>>> Hi Paula,
> >>>>
> >>>> The sentence detector in 3.1.0 and 3.1.1 (and previous releases)
> >>>> assumes sentences don't cross line boundaries.
> >>>> OpenNLP is used to find sentence breaks, but then if newlines are
> >>>> found, those are also set (within cTAKES, not OpenNLP) to be sentence
> >>> breaks.
> >>>> (just FYI I haven't had a chance to look at the ytex branch, which the
> >>>> subject commit is about)
> >>>>
> >>>> -- James
> >>>>
> >>>> -----Original Message-----
> >>>> From: [email protected] [mailto:
> >>>> [email protected]] On Behalf Of
> >>>> digital paula
> >>>> Sent: Tuesday, January 14, 2014 10:25 PM
> >>>> To: [email protected]
> >>>> Subject: RE: svn commit: r1551805 -
> >>>> /ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes
> >>>> /assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakes
> >>>> Impl.java
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> Hello cTAKES Developer Community,
> >>>> I'm a little behind on reading posts....this one is from last month.
> >>>> I think this issue is already addressed in current release? I'm still
> >>>> running the previous release...3.1.0.
> >>>> I just noticed something interesting, the negation didn't take when it
> >>>> is on a different line. I just removed all carriage returns from
> >>> narratives
> >>>> and negation picked it up as long as it's treated as one long string.
> >>> To
> >>>> better explain what I mean. Two narrative comments below.
> >>>>
> >>>> 1. patient did not have diabetes
> >>>> 2. patient did not have
> >>>> diabetes
> >>>>
> >>>> Number 1 above got negated but number 2 did not. This might be related
> >>>> to the issue w/the sectionizer. I noticed that when I treated the
> >>> narrative
> >>>> as one string the sectionizer never crashes with the NPE. Well the
> >>>> sectionizer is of no point if narrative is as one string but it's
> >>>> helping me pinpoint the problem.
> >>>>
> >>>> Regards,
> >>>> Paula
> >>>>
> >>>>
> >>>>> Date: Thu, 19 Dec 2013 11:04:57 -0500
> >>>>> Subject: Re: FW: svn commit: r1551805 -
> >>>> /ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes
> >>>> /assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakes
> >>>> Impl.java
> >>>>> From: [email protected]
> >>>>> To: [email protected]
> >>>>>
> >>>>> Hi Pei,
> >>>>>
> >>>>> I'm not sure if that would solve the problem: change in the ytex
> >>>>> branch causes newlines to be ignored (i.e. not treated as a token).
> >>>>> trunk's sentence splitter is splits sentences on newlines, so
> >>>>> newlines would
> >>>> never
> >>>>> be found in a sentence. However, if we had a reproducer we could
> >>>>> check
> >>>> it
> >>>>> fairly easily in the ytex branch.
> >>>>>
> >>>>> Best,
> >>>>>
> >>>>> VJ
> >>>>>
> >>>>>
> >>>>> On Thu, Dec 19, 2013 at 10:15 AM, Chen, Pei
> >>>>> <[email protected]>wrote:
> >>>>>
> >>>>>> Vj,
> >>>>>> Do you think this is what was causing the NPE's [1]?
> >>>>>> If so, shall we make the same fix in trunk?
> >>>>>> --Pei
> >>>>>>
> >>>>>> [1]
> >>>>>>
> >>>> http://mail-archives.apache.org/mod_mbox/ctakes-dev/201309.mbox/%3C924
> >>>> DE05C19409B438EB81DE683A942D9105A93CB%40CHEXMBX1A.CHBOSTON.ORG%3E
> >>>>>> -----Original Message-----
> >>>>>> From: [email protected] [mailto:[email protected]]
> >>>>>> Sent: Tuesday, December 17, 2013 9:15 PM
> >>>>>> To: [email protected]
> >>>>>> Subject: svn commit: r1551805 -
> >>>>>>
> >>>> /ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes
> >>>> /assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakes
> >>>> Impl.java
> >>>>>> Author: vjapache
> >>>>>> Date: Wed Dec 18 02:14:13 2013
> >>>>>> New Revision: 1551805
> >>>>>>
> >>>>>> URL: http://svn.apache.org/r1551805
> >>>>>> Log:
> >>>>>> add support for sentences that contain newline tokens.
> >>>>>>
> >>>>>> Modified:
> >>>>>>
> >>>>>>
> >>>> ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes/
> >>>> assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakesI
> >>>> mpl.java
> >>>>>> Modified:
> >>>>>>
> >>>> ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes/
> >>>> assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakesI
> >>>> mpl.java
> >>>>>> URL:
> >>>>>>
> >>>> http://svn.apache.org/viewvc/ctakes/branches/ytex/ctakes-assertion/src
> >>>> /main/java/org/apache/ctakes/assertion/medfacts/i2b2/api/CharacterOffs
> >>>> etToLineTokenConverterCtakesImpl.java?rev=1551805&r1=1551804&r2=155180
> >>>> 5&view=diff
> >>>>>>
> >>>> ======================================================================
> >>>> ========
> >>>>>> ---
> >>>>>>
> >>>> ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes/
> >>>> assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakesI
> >>>> mpl.java
> >>>>>> (original)
> >>>>>> +++
> >>>> ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctake
> >>>>>> +++
> >>>> s/assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCta
> >>>>>> +++ kesImpl.java Wed Dec 18 02:14:13 2013
> >>>>>> @@ -32,8 +32,8 @@ import org.apache.uima.jcas.tcas.Annotat import
> >>>>>> org.mitre.medfacts.i2b2.api.ApiConcept;
> >>>>>> import
> >>>>>> org.mitre.medfacts.zoner.CharacterOffsetToLineTokenConverter;
> >>>>>> import org.mitre.medfacts.zoner.LineAndTokenPosition;
> >>>>>> -
> >>>>>> import org.apache.ctakes.typesystem.type.syntax.BaseToken;
> >>>>>> +import org.apache.ctakes.typesystem.type.syntax.NewlineToken;
> >>>>>> import org.apache.ctakes.typesystem.type.textspan.Sentence;
> >>>>>>
> >>>>>> public class CharacterOffsetToLineTokenConverterCtakesImpl
> >>>>>> implements CharacterOffsetToLineTokenConverter
> >>>>>> @@ -78,11 +78,13 @@ public class CharacterOffsetToLineTokenC
> >>>>>> for (Annotation current : annotationIndex)
> >>>>>> {
> >>>>>> BaseToken bt = (BaseToken)current;
> >>>>>> - int begin = bt.getBegin();
> >>>>>> - int end = bt.getEnd();
> >>>>>> -
> >>>>>> - tokenBeginEndTreeSet.add(begin);
> >>>>>> - tokenBeginEndTreeSet.add(end);
> >>>>>> + // filter out NewlineToken
> >>>>>> + if (!(bt instanceof NewlineToken)) {
> >>>>>> + int begin = bt.getBegin();
> >>>>>> + int end = bt.getEnd();
> >>>>>> + tokenBeginEndTreeSet.add(begin);
> >>>>>> + tokenBeginEndTreeSet.add(end);
> >>>>>> + }
> >>>>>> }
> >>>>>> }
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>
> >>>>
> >>>>
> >>>
> >
>