[jira] [Commented] (TIKA-1633) Can't extract .png images from pdf document

Tim Allison (JIRA) Wed, 20 May 2015 09:57:22 -0700

    [ 
https://issues.apache.org/jira/browse/TIKA-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552660#comment-14552660
 ]


Tim Allison commented on TIKA-1633:
-----------------------------------

We're working towards allowing parser parameter configuration in the tika 
config file.  Unfortunately, we don't have that yet.  By default (see \[1\] and 
TIKA-1294), Tika is not pulling "inline" images as attachments.  Have you tried 
changing the values of the following parameters in the PDFParser.properties 
file under o.a.t.parser.pdf?

{noformat}
extractInlineImages
extractUniqueInlineImagesOnly
{noformat}

\[1\] 
http://mail-archives.apache.org/mod_mbox/tika-user/201505.mbox/%3cdm2pr09mb071346d01729fc9367308e94c7...@dm2pr09mb0713.namprd09.prod.outlook.com%3e

> Can't extract .png images from pdf document
> -------------------------------------------
>
>                 Key: TIKA-1633
>                 URL: https://issues.apache.org/jira/browse/TIKA-1633
>             Project: Tika
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 1.8
>            Reporter: Damiano
>
> Hello,
> I am running tika doing:
> *java -jar tika-server-1.8.jar*
> then I need to extract images from document, i use:
> *curl -X PUT -H "Accept: application/zip" -T /home/damiano/html_images.pdf 
> http://localhost:9998/unpack/all > content.zip*
> In content.zip I only see:
> __METADATA__
> __TEXT__
> nothing else!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TIKA-1633) Can't extract .png images from pdf document

Reply via email to