[jira] [Created] (TIKA-1297) Images not being extracted from PDFs

James Baker (JIRA) Tue, 13 May 2014 02:02:30 -0700

James Baker created TIKA-1297:
---------------------------------

             Summary: Images not being extracted from PDFs
                 Key: TIKA-1297
                 URL: https://issues.apache.org/jira/browse/TIKA-1297
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 1.5
            Reporter: James Baker



Images embedded within PDF documents are not being extracted by Tika. I have 
tested this via the command line (where the -z option fails to extract any 
images), and by inspecting the XHTML version of the PDF produced by Tika (where 
the image tags are not included in the output).

The images are extractable by PDFBox, so Tika should be able to extract them 
and include them in the XHTML output.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (TIKA-1297) Images not being extracted from PDFs

Reply via email to