[ 
https://issues.apache.org/jira/browse/NUTCH-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14325756#comment-14325756
 ] 

Sebastian Nagel commented on NUTCH-1925:
----------------------------------------

Hi [~tpalsulich], the patch breaks the parsing of XLSX files 
("src/testresources/test-mime-util/test.xlsx", cf. NUTCH-1605): the parser 
needs the additional hints from the URL (file name) and the content type sent 
in the HTTP response header. Also it's good to keep plugins the same (as much 
as possible) between trunk and 2.x. Needs further investigation what's going 
wrong, a unit test for the xlsx parser would be nice to have.

> Upgrade Tika to version 1.7
> ---------------------------
>
>                 Key: NUTCH-1925
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1925
>             Project: Nutch
>          Issue Type: Improvement
>          Components: build
>            Reporter: Tyler Palsulich
>            Assignee: Markus Jelsma
>            Priority: Blocker
>             Fix For: 1.10, 2.3.1
>
>         Attachments: NUTCH-1925-2x.patch, NUTCH-1925.palsulich.p2.patch, 
> NUTCH-1925.palsulich.patch, NUTCH-1925.palsulich.v2.patch
>
>
> Hi Folks. Nutch currently uses version 1.6 of Tika. There were no significant 
> API changes between 1.6 and 1.7. So, this should be a one line update.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to