Nick Burch created TIKA-2374:
--------------------------------
Summary: Tika App -z should extract PDF inline images by default
Key: TIKA-2374
URL: https://issues.apache.org/jira/browse/TIKA-2374
Project: Tika
Issue Type: Improvement
Components: cli
Affects Versions: 1.14
Reporter: Nick Burch
As discussed on dev@ - If you use the Tika App with the default config and the
{{-z}} extract option, it will extract embedded resources, except PDF inline
images. This is unexpected for new users, who won't know that they'd need to
pass in a custom config with the {{extractInlineImages}} PDF parser option set
If the user passes in an explicit config to the app, we should respect that.
However, if they don't pass one in and take the default, the -z option should
(but only that one) enable whatever options are needed to make extraction work
properly + fully (currently just {{extractInlineImages}})
If possible/easy, the -z option should print out some info to let affected
users know that the default config was tweaked to give extra embedded resources
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)