[ 
https://issues.apache.org/jira/browse/TIKA-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksandr Dubinsky updated TIKA-1309:
-------------------------------------

    Description: RTF files (such as those produced by WordPad) often encode 
consecutive linebreaks as simply consecutive \par commands. However, 
org.apache.tika.parser.rtf.TextExtractor ignores the second \par. Solution is 
very simple. See attached patch.  (was: RTF files (such as those produced by 
WordPad) typically encode consecutive linebreaks as simply consecutive \par 
commands. However, org.apache.tika.parser.rtf.TextExtractor ignores the second 
\par. Solution is very simple. See attached patch.)

> RTF TextExtractor ignores consecutive linebreaks
> ------------------------------------------------
>
>                 Key: TIKA-1309
>                 URL: https://issues.apache.org/jira/browse/TIKA-1309
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.5, 1.6
>            Reporter: Aleksandr Dubinsky
>         Attachments: 0001-fix-RTF-ignores-consecutive-newlines.patch, test.rtf
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> RTF files (such as those produced by WordPad) often encode consecutive 
> linebreaks as simply consecutive \par commands. However, 
> org.apache.tika.parser.rtf.TextExtractor ignores the second \par. Solution is 
> very simple. See attached patch.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to