feng ye created TIKA-2643:
-----------------------------
Summary: Tika call hangs when processes a pdf on Cloudera Hadoop
Key: TIKA-2643
URL: https://issues.apache.org/jira/browse/TIKA-2643
Project: Tika
Issue Type: Bug
Components: parser
Affects Versions: 1.17
Environment: Cloudera Hadoop 5.8
Reporter: feng ye
Fix For: 1.17
Attachments: hang-stdout.txt, hang.zip, testJournalParser.pdf
Tika.parseToString(InputStream) hangs when called within a MapReduce job to
process a pdf file from Cloudera Hadoop 5.8 (observed on 5.4 too). It can
process some other pdf files on the same cluster. I am attaching the file and
the syslog as well as stdout logs. Interesting that the same file can be
processed fine over a Hortonworks cluster.
This issue is a blocker for us to make our feature based on Tika available to
Cloudera cluster, a major flavor of Hadoop, so your timely attention would be
very much appreciated.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)