[
https://issues.apache.org/jira/browse/TIKA-3703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-3703:
------------------------------
Description:
For those who want more than just text and metadata, e.g. bytes for thumbnails,
or embedded images or embedded files or rendered pages, it would be great to
return that data in a standard format. Our current /unpack endpoint uses a zip
file but with our own "standard".
I was thinking about heading down the pure json option by including these byte
streams as base64 encoded metadata values in our current metadata object. Not
sure which is the better way to go.
I'm opening this issue to discuss options.
Reference: [https://frictionlessdata.io/standards/#standards-toolkit]
We'd want to make this available as an endpoint on tika-server (\{{/v2/unpack}}
or something else?) and as a commandline option in tika-app.
was:
For those who want more than just text and metadata, e.g. bytes for thumbnails,
or embedded images or embedded files or rendered pages, it would be great to
return that data in a standard format. Our current /unpack endpoint uses a zip
file but with our own "standard".
I was thinking about heading down the pure json option by including these byte
streams as base64 encoded metadata values in our current metadata object. Not
sure which is the better way to go.
I'm opening this issue to discuss options.
https://frictionlessdata.io/standards/#standards-toolkit
> Consider adding a frictionless data package output format
> ---------------------------------------------------------
>
> Key: TIKA-3703
> URL: https://issues.apache.org/jira/browse/TIKA-3703
> Project: Tika
> Issue Type: Task
> Reporter: Tim Allison
> Priority: Major
>
> For those who want more than just text and metadata, e.g. bytes for
> thumbnails, or embedded images or embedded files or rendered pages, it would
> be great to return that data in a standard format. Our current /unpack
> endpoint uses a zip file but with our own "standard".
> I was thinking about heading down the pure json option by including these
> byte streams as base64 encoded metadata values in our current metadata
> object. Not sure which is the better way to go.
> I'm opening this issue to discuss options.
>
> Reference: [https://frictionlessdata.io/standards/#standards-toolkit]
> We'd want to make this available as an endpoint on tika-server
> (\{{/v2/unpack}} or something else?) and as a commandline option in tika-app.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)