Dmitriy Dligach created CTAKES-449: -------------------------------------- Summary: PolarityCleartkAnalysisEngine slow for large documents Key: CTAKES-449 URL: https://issues.apache.org/jira/browse/CTAKES-449 Project: cTAKES Issue Type: Improvement Components: ctakes-assertion Reporter: Dmitriy Dligach
As soon as I add at the end of my pipeline the negation AE: aggregateBuilder.add( PolarityCleartkAnalysisEngine.createAnnotatorDescription() ); The pipeline becomes 50-100 times slower. This likely has to do with the line: List<Sentence> sents = new ArrayList<>(JCasUtil.selectCovering(jCas, Sentence.class, entityOrEventMention.getBegin(), entityOrEventMention.getEnd())); in AssertionCleartkAnalysisEngine. I am running the pipeline on large files (i.e. having a large number of sentences). The slowdown is caused by the code's obtaining all sentences in a document for each identified annotation. The full pipeline is here: https://github.com/dmitriydligach/ctakes-misc/blob/master/src/main/java/org/apache/ctakes/pipelines/UmlsLookupPipeline.java -- This message was sent by Atlassian JIRA (v6.4.14#64029)