[
https://issues.apache.org/jira/browse/TIKA-3169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17178930#comment-17178930
]
Carina Antunes commented on TIKA-3169:
--------------------------------------
A small note that the docs have a bug
[https://cwiki.apache.org/confluence/display/TIKA/TikaServer#TikaServer-Transfer-LayerCompression].
They should probably be something like: (still unclear if we should send (gzip
or application/gzip)
---
If you want to {{gzip}} your files before sending to {{tika-server}} , add
{noformat}
gzip test_my_doc.pdf{noformat}
{noformat}
curl -T test_my_doc.pdf.gz -H "Content-Encoding: application/gzip"
http://localhost:9998/rmeta{noformat}
If you want {{tika-server}} to compress the output of the parse:
{noformat}
curl -T test_my_doc.pdf.gz -H "Accept-Encoding: gzip, deflate"
http://localhost:9998/rmeta --compressed {noformat}
> rmeta and Content-Encoding application/gzip vs gzip
> ---------------------------------------------------
>
> Key: TIKA-3169
> URL: https://issues.apache.org/jira/browse/TIKA-3169
> Project: Tika
> Issue Type: Bug
> Affects Versions: 1.24.1
> Reporter: Carina Antunes
> Priority: Minor
>
> If I send a pdf with `-H "Content-Encoding: application/gzip" ` to rmeta I
> get a different result that if I send with `-H "Content-Encoding: gzip" `.
> The first adds an object to the response array with "Content-Type":
> "application/gzip"
> {code:java}
> [{
> "Content-Type": "application/gzip",
> "X-Parsed-By": [
> "org.apache.tika.parser.DefaultParser",
> "org.apache.tika.parser.pkg.CompressorParser"
> ],
> "X-TIKA:embedded_depth": "0",
> "X-TIKA:parse_time_millis": "31"
> },
> {
> ...
> "Content-Type": "application/pdf",
> ...
> }]{code}
> while the latter only returns the pdf object:
>
>
> {code:java}
> [{
> ...
> "Content-Type": "application/pdf",
> ...
> }]
> {code}
>
> Example:
>
> {code:java}
> $ gzip test.pdf
> $ curl -T test.pdf.gz http://localhost:9998/rmeta/text -H "Content-Encoding:
> application/gzip"
> {code}
>
> vs
>
> {code:java}
> $ curl -T test.pdf.gz http://localhost:9998/rmeta/text -H "Content-Encoding:
> gzip"
> {code}
> Not sure if the behaviour is intended.
> If no header is sent the default behaviour is "application/gzip"
> {code:java}
> $ curl -T test.pdf.gz http://localhost:9998/rmeta/text {code}
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)