[
https://issues.apache.org/jira/browse/TIKA-1318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14016840#comment-14016840
]
Nick Burch commented on TIKA-1318:
----------------------------------
It might make sense to switch this for a call to WordToHtmlConverter, which is
able to work with a HWPFOldDocument. Currently, we're calling WordExtractor
which in turn passes the HWPFOldDocument to WordToTextConverter, so we loose
out on a bit of formatting.
> Use of Deprecated Word6Extractor.getParagraphText() Method
> ----------------------------------------------------------
>
> Key: TIKA-1318
> URL: https://issues.apache.org/jira/browse/TIKA-1318
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.5
> Reporter: Tyler Palsulich
> Priority: Minor
> Labels: deprecation
> Fix For: 1.6
>
>
> org.apache.tika.parser.microsoft.WordExtractor.parseWord6() uses the
> deprecated Word6Extractor.getParagraphText() method. getParagraphText() is
> supposed to return a String[] with an element for each paragraph in the text.
> The replacement is getText(), which lets paragraph, cell, etc separation be
> implementation specific. I'm not sure, at this point, how the POI
> WordExtractor separates them.
--
This message was sent by Atlassian JIRA
(v6.2#6252)