[
https://issues.apache.org/jira/browse/TIKA-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-3348:
------------------------------
Fix Version/s: (was: 2.0.0)
2.0.0-BETA
> Improve the workflow for extracting and returning images from PDFs and other
> containers using Tika Server..
> -----------------------------------------------------------------------------------------------------------
>
> Key: TIKA-3348
> URL: https://issues.apache.org/jira/browse/TIKA-3348
> Project: Tika
> Issue Type: Improvement
> Components: server
> Affects Versions: 1.25
> Reporter: Simon Lucy
> Priority: Major
> Fix For: 2.0.0-BETA
>
>
> There's a set of bumps in the road to navigate when extracting images from
> PDFs, retrieving them and managing the metadata using Tika Server.
> The first is knowing that /unpack will do the basic job and return the
> embedded objects in a zip file (presuming setExtractInlineImages is True).
> Documenting this clearly in the Tika Server wiki page would help people
> enormously.
> But processing those images after they've been extracted will either need
> inspecting with another tool or using /rmeta to return the mime types and the
> rest of the metadata.
> This means that multiple passes need to be made over the same file and the
> same processes of extraction, identification and temporary storage will be
> made over.
> The server processes of /rmeta and /unpack need to be melded. The simplest
> may be to generate /rmeta metadata in the __META__ file object added to the
> returned zip file. A more complicated but perhaps more hypermedia way would
> be to use Content Negotiation and indicate an Accept application/zip in the
> /rmeta request.
> I've indicated a Fix version of 2.0 because it is if not a breaking change a
> considerable one.
> I'm available for Help Wanted, if that helps.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)