[ 
https://issues.apache.org/jira/browse/NUTCH-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tyler Palsulich reopened NUTCH-1925:
------------------------------------

Reopening since this is causing a test failure in the 2.x branch:
{code}
java.lang.NullPointerException
        at 
org.apache.nutch.parse.tika.TestImageMetadata.testIt(TestImageMetadata.java:73)
{code}

The relevant lines of the test are:
{code}
      parse = new ParseUtil(conf).parse(urlString, page);
      ByteBuffer bbufW = page.getMetadata().get(new Utf8("width"));
      byte[] byteArrayW = new byte[bbufW.remaining()];  // <-- NPE
{code}

{{page.getMetadata().keySet()}} does not have "width" or "height." But, they 
are extracted when running Tika directly (and on the 1.x branch).

I'm investigating why right now. But, seem to be going in circles.

> Upgrade Tika to version 1.7
> ---------------------------
>
>                 Key: NUTCH-1925
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1925
>             Project: Nutch
>          Issue Type: Improvement
>          Components: build
>            Reporter: Tyler Palsulich
>            Assignee: Markus Jelsma
>            Priority: Blocker
>             Fix For: 1.10, 2.3.1
>
>         Attachments: NUTCH-1925-2x.patch, NUTCH-1925.palsulich.patch, 
> NUTCH-1925.palsulich.v2.patch
>
>
> Hi Folks. Nutch currently uses version 1.6 of Tika. There were no significant 
> API changes between 1.6 and 1.7. So, this should be a one line update.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to