Mark Aragon created NUTCH-2742:

             Summary: Unable to parse specific pdf file
                 Key: NUTCH-2742
             Project: Nutch
          Issue Type: Bug
          Components: nutchNewbie, parser
    Affects Versions: 1.15
            Reporter: Mark Aragon

It appears that the Tika plugin is not parsing some PDF files.

An example is 

When I completed a dump of the 



Recno:: 0




Version: 7

Status: 1 (db_unfetched)

Fetch time: Mon Oct 07 00:00:37 AEDT 2019

Modified time: Thu Jan 01 10:00:00 AEST 1970

Retries since fetch: 0

Retry interval: 2592000 seconds (30 days)

Score: 1.0

Signature: null




This message was sent by Atlassian Jira

Reply via email to