[ 
https://issues.apache.org/jira/browse/TIKA-2150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15615786#comment-15615786
 ] 

Tim Allison commented on TIKA-2150:
-----------------------------------

Thank you for opening this and submitting a minimal file and even diagnosing 
the problem!  I'm not yet sure how best to fix this. We rely on "texty" signals 
"par", etc to determine that we're no longer in the header.  I worry that going 
for a stricter parse will have unintended consequences.  I'll dig some more.  
Thank you, again.

> RTF TextExtractor omits some content
> ------------------------------------
>
>                 Key: TIKA-2150
>                 URL: https://issues.apache.org/jira/browse/TIKA-2150
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.13
>            Reporter: T. Schmidt
>         Attachments: bi16tabe.000
>
>
> The TextExtractor class seems to handle the first two content words (TO FROM) 
> in the provided file as if they would belong to the header. They are missing 
> in the text output .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to