Pei, I did what you recommended, I run a test input with this new pipeline and did a diff with the clinical pipeline without the smoking status on the two CAS files. It seems to do the trick, the Umls concept tags are still the same, and there is now a new tag for the smoking status annotation, great!
Before I create the Jira item, what do you mean with removing the last NegEx? In AggregatePlaintextFastUMLSProcessor, the node of the NegationAnnotator is commented out: <!-- <node>NegationAnnotator</node> --> Did you mean this node? At the top of the file, there is an import for the NegationAnnotator: <delegateAnalysisEngine key="NegationAnnotator">, but it is not commented out and never run in the fixed flow. Am I correct that the negation detection in the clinical pipeline is now performed by PolarityCleartkAnalysisEngine? Thanks, Tom On Sat, Apr 18, 2015 at 12:53 AM, Pei Chen <[email protected]> wrote: > Tom, > I would put it at the end of the pipeline (at a min, it should be behind > sectionizer, sentence, tokenizer, lvg). I would remove > ExternalBaseAggregateTAE > as this simulates the sectionizer, sentence, tokenizer, lvg would would be > redundant. I would also probably remove the last NegEx which could > override the assertion values. > > Disclaimer: I did not test this yet. Feel free to open a Jira item if it > works for you so it can be tracked. It seems kind of strange to have a > descriptor xml define another xml descriptor to be loaded up via code > again- I think this could be simplified. > --Pei > > On Thu, Apr 16, 2015 at 7:29 PM, Tom Devel <[email protected]> wrote: > > > Hi, > > > > I am using the smoking status AE from SimulatedProdSmokingTAE.xml, it > works > > fine, I can see the smoking status annotation in the CVD. > > > > Now I would like to include the smoking status detection in the clinical > > pipeline of AggregatePlaintextFastUMLSProcessor.xml, so that when I run > the > > clinincal pipeline, the smoking status will also be determined. > > > > How can I do this? > > > > I am thinking to just put the nodes from the fixed flow of > > SimulatedProdSmokingTAE.xml into the fixed flow of > > AggregatePlaintextFastUMLSProcessor.xml, is this the right approach? > > > > If so, at which exact place in the clinical pipeline fixed flow should > > these nodes be added? > > > > Is there a preferred place (such as append after the last node or put > > before the first node) ? > > > > Can a wrong position or ordering of the smoking status nodes > damage/corrupt > > the rest of the annotations? > > > > SimulatedProdSmokingTAE.xml contains these lines with the fixed flow: > > > > <fixedFlow> > > <node>ExternalBaseAggregateTAE</node> > > <node>SentenceAdjuster</node> > > <node>ClassifiableEntriesAnnotator</node> > > </fixedFlow> > > > > AggregatePlaintextFastUMLSProcessor.xml (3.2.2 from SVN) contains this > > fixed flow: > > > > <fixedFlow> > > <node>SimpleSegmentAnnotator</node> > > <node>SentenceDetectorAnnotator</node> > > <node>TokenizerAnnotator</node> > > <node>LvgAnnotator</node> > > <node>ContextDependentTokenizerAnnotator</node> > > <node>POSTagger</node> > > <!-- <node>ClearPOSTagger</node> --> > > <node>Chunker</node> > > <node>AdjustNounPhraseToIncludeFollowingNP</node> > > <node>AdjustNounPhraseToIncludeFollowingPPNP</node> > > <!--<node>LookupWindowAnnotator</node>--> > > <node>DictionaryLookupAnnotatorDB</node> > > <node>DrugNER</node> > > <node>DependencyParser</node> > > <node>SemanticRoleLabeler</node> > > <node>ConstituencyParser</node> > > <!-- <node>AssertionAnnotator</node> --> > > <!-- <node>StatusAnnotator</node> --> > > <!-- <node>NegationAnnotator</node> --> > > <node>GenericCleartkAnalysisEngine</node> > > <node>HistoryCleartkAnalysisEngine</node> > > <node>PolarityCleartkAnalysisEngine</node> > > <node>SubjectCleartkAnalysisEngine</node> > > <node>UncertaintyCleartkAnalysisEngine</node> > > > > <node>ExtractionPrepAnnotator</node> > > </fixedFlow> > > > > Thanks for any help or pointers, > > > > Tom > > >
