James Baker created TIKA-1297:
---------------------------------
Summary: Images not being extracted from PDFs
Key: TIKA-1297
URL: https://issues.apache.org/jira/browse/TIKA-1297
Project: Tika
Issue Type: Bug
Components: parser
Affects Versions: 1.5
Reporter: James Baker
Images embedded within PDF documents are not being extracted by Tika. I have
tested this via the command line (where the -z option fails to extract any
images), and by inspecting the XHTML version of the PDF produced by Tika (where
the image tags are not included in the output).
The images are extractable by PDFBox, so Tika should be able to extract them
and include them in the XHTML output.
--
This message was sent by Atlassian JIRA
(v6.2#6252)