[jira] [Commented] (TIKA-1318) Use of Deprecated Word6Extractor.getParagraphText() Method

Nick Burch (JIRA) Tue, 03 Jun 2014 09:15:23 -0700

    [ 
https://issues.apache.org/jira/browse/TIKA-1318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14016840#comment-14016840
 ]


Nick Burch commented on TIKA-1318:
----------------------------------

It might make sense to switch this for a call to WordToHtmlConverter, which is 
able to work with a HWPFOldDocument. Currently, we're calling WordExtractor 
which in turn passes the HWPFOldDocument to WordToTextConverter, so we loose 
out on a bit of formatting.

> Use of Deprecated Word6Extractor.getParagraphText() Method
> ----------------------------------------------------------
>
>                 Key: TIKA-1318
>                 URL: https://issues.apache.org/jira/browse/TIKA-1318
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.5
>            Reporter: Tyler Palsulich
>            Priority: Minor
>              Labels: deprecation
>             Fix For: 1.6
>
>
> org.apache.tika.parser.microsoft.WordExtractor.parseWord6() uses the 
> deprecated Word6Extractor.getParagraphText() method. getParagraphText() is 
> supposed to return a String[] with an element for each paragraph in the text. 
> The replacement is getText(), which lets paragraph, cell, etc separation be 
> implementation specific. I'm not sure, at this point, how the POI 
> WordExtractor separates them.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (TIKA-1318) Use of Deprecated Word6Extractor.getParagraphText() Method

Reply via email to