[jira] [Resolved] (TIKA-2500) Apache Tika do not extract first line of the RTF file, It only extract last three char of first line.

Tim Allison (JIRA) Wed, 29 May 2019 08:26:36 -0700


     [ 
https://issues.apache.org/jira/browse/TIKA-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Tim Allison resolved TIKA-2500.
-------------------------------
       Resolution: Fixed
         Assignee: Tim Allison
    Fix Version/s: 1.22

Fixed via TIKA-2883

> Apache Tika do not extract first line of the RTF file, It only extract last 
> three char of first line.
> -----------------------------------------------------------------------------------------------------
>
>                 Key: TIKA-2500
>                 URL: https://issues.apache.org/jira/browse/TIKA-2500
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 1.16
>            Reporter: Rohit Sureshrao Shelhalkar
>            Assignee: Tim Allison
>            Priority: Major
>              Labels: RTF, RTFParser
>             Fix For: 1.22
>
>
> When I am parsing the RTF file, it only prints last three characters of the 
> first line.Remaining lines are printed properly.
> Following is the RTF file text. Please copy the text and save as RTF format.
> {\rtf1\ansi\ansicpg1252\uc1\deff0 {\rtf1\ansi\deff0 {\fonttbl{\f0 Arial;}}\f0 
> {\line {\b Level1} : \par} {\line This is level1 paragraph\line This is 
> level1 paragraph\line This is level1 paragraph\line This is level1 
> paragraph\line This is level1 paragraph\line This is level1 paragraph\par} 
> {\line {\b Level2} : \par} This is level2 paragraph.\line This is level2 
> paragraph.\line This is level2 paragraph.\par} } \line \par}
> When you save this text as RTF file it will look like this.
> [https://i.stack.imgur.com/p1rWe.png]
> I have parsed above RTF from the following code.
> {code:java}
> BodyContentHandler handler = new BodyContentHandler();
> Metadata metadata = new Metadata();
> FileInputStream inputstream = new FileInputStream(new 
> File("level1Missing.rtf"));
> ParseContext pcontext = new ParseContext();
> RTFParser rt = new RTFParser();
> rt.parse(inputstream, handler, metadata, pcontext);
> //getting the content of the document
> System.out.println("Contents of the PDF :\n\n" + handler.toString());
> {code}
> The output of the above code is following.
> https://i.stack.imgur.com/cJYuQ.png



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (TIKA-2500) Apache Tika do not extract first line of the RTF file, It only extract last three char of first line.

Reply via email to