Carina Antunes created TIKA-3169:
------------------------------------
Summary: rmeta and Content-Encoding application/gzip vs gzip
Key: TIKA-3169
URL: https://issues.apache.org/jira/browse/TIKA-3169
Project: Tika
Issue Type: Bug
Affects Versions: 1.24.1
Reporter: Carina Antunes
If I send a pdf with `-H "Content-Encoding: application/gzip" ` to rmeta I get
a different result that if I send with `-H "Content-Encoding: gzip" `.
The first adds an object to the response array with "Content-Type":
"application/gzip"
{code:java}
[{
"Content-Type": "application/gzip",
"X-Parsed-By": [
"org.apache.tika.parser.DefaultParser",
"org.apache.tika.parser.pkg.CompressorParser"
],
"X-TIKA:embedded_depth": "0",
"X-TIKA:parse_time_millis": "31"
},
{
...
"Content-Type": "application/pdf",
...
}]{code}
while the latter only returns the pdf object:
{code:java}
[{
...
"Content-Type": "application/pdf",
...
}]
{code}
Example:
{code:java}
$ gzip test.pdf
$ curl -T test.pdf.gz http://localhost:9998/rmeta/text -H "Content-Encoding:
application/gzip"
{code}
vs
{code:java}
$ curl -T test.pdf.gz http://localhost:9998/rmeta/text -H "Content-Encoding:
gzip"
{code}
Not sure if the behaviour is intended.
If no header is sent the default behaviour is "application/gzip"
{code:java}
$ curl -T test.pdf.gz http://localhost:9998/rmeta/text {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)