[jira] [Commented] (TIKA-1268) Extract images from PDF documents

Jeremy Anderson (JIRA) Wed, 10 Sep 2014 11:18:56 -0700

    [ 
https://issues.apache.org/jira/browse/TIKA-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14128855#comment-14128855
 ]


Jeremy Anderson commented on TIKA-1268:
---------------------------------------

I created the TIKA-1285 patch after making that comment, but never re-linked it 
in.  I did some work on TIKA-1285 last week to re-sync the snapshot builds of 
the two projects, though things became even more complicated with PDF's 
snapshot transitioning from Jempbox to Xmpbox.  The current patch files for 
that one should work with the snapshots, though Xmpbox's DomXmpParser needs 
some refactoring to properly work with Tika's test files.  I believe metadata 
is being dropped for a few of Tikas test files.

> Extract images from PDF documents
> ---------------------------------
>
>                 Key: TIKA-1268
>                 URL: https://issues.apache.org/jira/browse/TIKA-1268
>             Project: Tika
>          Issue Type: New Feature
>          Components: parser
>            Reporter: Jukka Zitting
>            Assignee: Jukka Zitting
>             Fix For: 1.6
>
>
> It would be nice if images within PDF documents could be extracted much like 
> embedded attachments are now being handled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TIKA-1268) Extract images from PDF documents

Reply via email to