[
https://issues.apache.org/jira/browse/TIKA-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Aleksandr Dubinsky updated TIKA-1309:
-------------------------------------
Description: RTF files (such as those produced by WordPad) often encode
consecutive linebreaks as simply consecutive \par commands. However,
org.apache.tika.parser.rtf.TextExtractor ignores the second \par. Solution is
very simple. See attached patch. (was: RTF files (such as those produced by
WordPad) typically encode consecutive linebreaks as simply consecutive \par
commands. However, org.apache.tika.parser.rtf.TextExtractor ignores the second
\par. Solution is very simple. See attached patch.)
> RTF TextExtractor ignores consecutive linebreaks
> ------------------------------------------------
>
> Key: TIKA-1309
> URL: https://issues.apache.org/jira/browse/TIKA-1309
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.5, 1.6
> Reporter: Aleksandr Dubinsky
> Attachments: 0001-fix-RTF-ignores-consecutive-newlines.patch, test.rtf
>
> Original Estimate: 0h
> Remaining Estimate: 0h
>
> RTF files (such as those produced by WordPad) often encode consecutive
> linebreaks as simply consecutive \par commands. However,
> org.apache.tika.parser.rtf.TextExtractor ignores the second \par. Solution is
> very simple. See attached patch.
--
This message was sent by Atlassian JIRA
(v6.2#6252)