[
https://issues.apache.org/jira/browse/TIKA-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552660#comment-14552660
]
Tim Allison commented on TIKA-1633:
-----------------------------------
We're working towards allowing parser parameter configuration in the tika
config file. Unfortunately, we don't have that yet. By default (see \[1\] and
TIKA-1294), Tika is not pulling "inline" images as attachments. Have you tried
changing the values of the following parameters in the PDFParser.properties
file under o.a.t.parser.pdf?
{noformat}
extractInlineImages
extractUniqueInlineImagesOnly
{noformat}
\[1\]
http://mail-archives.apache.org/mod_mbox/tika-user/201505.mbox/%3cdm2pr09mb071346d01729fc9367308e94c7...@dm2pr09mb0713.namprd09.prod.outlook.com%3e
> Can't extract .png images from pdf document
> -------------------------------------------
>
> Key: TIKA-1633
> URL: https://issues.apache.org/jira/browse/TIKA-1633
> Project: Tika
> Issue Type: Bug
> Components: server
> Affects Versions: 1.8
> Reporter: Damiano
>
> Hello,
> I am running tika doing:
> *java -jar tika-server-1.8.jar*
> then I need to extract images from document, i use:
> *curl -X PUT -H "Accept: application/zip" -T /home/damiano/html_images.pdf
> http://localhost:9998/unpack/all > content.zip*
> In content.zip I only see:
> __METADATA__
> __TEXT__
> nothing else!
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)