Carina Antunes created TIKA-3169:
------------------------------------

             Summary: rmeta and Content-Encoding application/gzip vs gzip
                 Key: TIKA-3169
                 URL: https://issues.apache.org/jira/browse/TIKA-3169
             Project: Tika
          Issue Type: Bug
    Affects Versions: 1.24.1
            Reporter: Carina Antunes


If I send a pdf with `-H "Content-Encoding: application/gzip" ` to rmeta I get 
a different result that if I send with `-H "Content-Encoding: gzip" `. 

The first adds an object to the response array with "Content-Type": 
"application/gzip"
{code:java}
[{
 "Content-Type": "application/gzip",
 "X-Parsed-By": [
 "org.apache.tika.parser.DefaultParser",
 "org.apache.tika.parser.pkg.CompressorParser"
 ],
 "X-TIKA:embedded_depth": "0",
 "X-TIKA:parse_time_millis": "31"
 },
{
...
"Content-Type": "application/pdf",
...
}]{code}

while the latter only returns the pdf object:

 

 
{code:java}
[{
...
"Content-Type": "application/pdf",
...
}]
{code}
 

Example:

 
{code:java}
$ gzip test.pdf
$ curl -T test.pdf.gz http://localhost:9998/rmeta/text -H "Content-Encoding: 
application/gzip"
{code}
 

vs 

 
{code:java}
$ curl -T test.pdf.gz http://localhost:9998/rmeta/text -H "Content-Encoding:  
gzip" 
{code}
Not sure if the behaviour is intended. 

If no header is sent the default behaviour is "application/gzip"
{code:java}
$ curl -T test.pdf.gz http://localhost:9998/rmeta/text  {code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to