[ 
https://issues.apache.org/jira/browse/TIKA-1544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mortee updated TIKA-1544:
-------------------------
    Description: I'm trying to extract the text content from RTF documents. The 
files contain empty lines (two or more consecutive paragraph-end marks), on 
which the further processing relies to tell apart different parts of the text. 
But unfortuantely Tika (with --text switch) eliminates all those empty lines, 
instead of preserving them.  (was: I'm trying to extract the text content from 
RTF documents. The files contain empty lines, on which the further processing 
relies to tell apart different parts of the text. But unfortuantely Tika (with 
--text switch) eliminates all those empty lines, instead of preserving them.)

> empty lines are not preserved
> -----------------------------
>
>                 Key: TIKA-1544
>                 URL: https://issues.apache.org/jira/browse/TIKA-1544
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 1.6
>         Environment: Windows 8, Java 1.8
>            Reporter: mortee
>            Priority: Minor
>
> I'm trying to extract the text content from RTF documents. The files contain 
> empty lines (two or more consecutive paragraph-end marks), on which the 
> further processing relies to tell apart different parts of the text. But 
> unfortuantely Tika (with --text switch) eliminates all those empty lines, 
> instead of preserving them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to