[
https://issues.apache.org/jira/browse/TIKA-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14128855#comment-14128855
]
Jeremy Anderson commented on TIKA-1268:
---------------------------------------
I created the TIKA-1285 patch after making that comment, but never re-linked it
in. I did some work on TIKA-1285 last week to re-sync the snapshot builds of
the two projects, though things became even more complicated with PDF's
snapshot transitioning from Jempbox to Xmpbox. The current patch files for
that one should work with the snapshots, though Xmpbox's DomXmpParser needs
some refactoring to properly work with Tika's test files. I believe metadata
is being dropped for a few of Tikas test files.
> Extract images from PDF documents
> ---------------------------------
>
> Key: TIKA-1268
> URL: https://issues.apache.org/jira/browse/TIKA-1268
> Project: Tika
> Issue Type: New Feature
> Components: parser
> Reporter: Jukka Zitting
> Assignee: Jukka Zitting
> Fix For: 1.6
>
>
> It would be nice if images within PDF documents could be extracted much like
> embedded attachments are now being handled.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)