Tommaso - could you take a look? -Marshall
On 11/20/2014 3:09 PM, Vadym Oliinyk (JIRA) wrote: > Vadym Oliinyk created UIMA-4115: > ----------------------------------- > > Summary: TikaAnnotator: incorrect order of tags processing > Key: UIMA-4115 > URL: https://issues.apache.org/jira/browse/UIMA-4115 > Project: UIMA > Issue Type: Bug > Components: addons > Affects Versions: 2.3.1Addons > Reporter: Vadym Oliinyk > > > org.apache.uima.tika.MarkupAnnotator outputs incorrect content due to bug in > org.apache.uima.tika.MarkupHandler. The problem located in the end element > event handler: MarkupHandler#endElement method should close opened tags by > removing them from the stack (last added tag should be removed first if > corresponding end tag found). But in current implementation it removes start > elements beginning from the first open element which results in incorrect > text spans annotated by the processor. > > The fix is trivial: > in MarkupHandler#endElement replace startedAnnotations.iterator() with > startedAnnotations.descendingIterator(). > > > > -- > This message was sent by Atlassian JIRA > (v6.3.4#6332) > >
