[
https://issues.apache.org/jira/browse/NUTCH-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki updated NUTCH-1154:
-------------------------------------
Attachment: NUTCH-1154.diff
Patch to upgrade to Tika 0.10. Unfortunately, TestRTFParser fails with this
version of Tika - the extracted body of the text is empty. See TIKA-748. Still,
I think the improvements in PDF and Office parsers are worth the upgrade.
> Upgrade to Tika 0.10
> --------------------
>
> Key: NUTCH-1154
> URL: https://issues.apache.org/jira/browse/NUTCH-1154
> Project: Nutch
> Issue Type: Improvement
> Components: parser
> Affects Versions: 1.4
> Reporter: Andrzej Bialecki
> Attachments: NUTCH-1154.diff
>
>
> There have been significant improvements in Tika 0.10 and it would be nice to
> use the latest Tika in 1.4.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira