[ 
https://issues.apache.org/jira/browse/TIKA-3007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16998072#comment-16998072
 ] 

Johan commented on TIKA-3007:
-----------------------------

Hi,

Ok we see that indeed your call above is working but then we have some 
questions about how this relates to Content-Type in either the server version 
or app version.
----
*Question 1:* The -j option to output metadata as json on  heic/heif images 
should work similar as on other file types?

We took the images from the  {{tika-parsers/src/test/resources/test-documents}} 
as examples to explain the different results we see which seems to be 
inconsistent,

-d or detect stream on server returns image/heic so that is good

 
{code:java}
java -jar tika-app-1.23.jar -d /~/Desktop/testHEIF.heic
image/heic
{code}
 

 
{code:java}
curl -X PUT --data-binary @~/Desktop/testHEIF.heic 
http://localhost:9998/detect/stream
image/heic{code}
 

-j on app does not return anything for heic/heif images but it does for normal 
jpg

 
{code:java}
java -jar tika-app-1.23.jar -j ~/Desktop/testHEIF.heic
# nothing
{code}
 
{code:java}
java -jar tika-app-1.23.jar -j ~/Desktop/baseball.jpg
{"Blue Colorant":"(0.1492, 0.0632, 0.7446)","Bl ...

{code}
 Now that seems weird to us cause if you just ask -m (metadata without json 
format) it seems to work. Also works for -J which gets it for all embedded 
files.

 
{code:java}
java -jar tika-app-1.23.jar -m /Users/butsjoh/Desktop/testHEIF.heic
Content-Length: 13706
Content-Type: application/mp4
X-Parsed-By: org.apache.tika.parser.DefaultParser
X-Parsed-By: org.apache.tika.parser.mp4.MP4Parser
resourceName: testHEIF.heic{code}
 

 
{code:java}
java -jar tika-app-1.23.jar -J /Users/butsjoh/Desktop/testHEIF.heic
[{"Content-Length":"13706","Content-Type":"application/mp4","X-Parsed-By":["org.apache.tika.parser.DefaultParser","org.apache.tika.parser.mp4.MP4Parser"],"X-TIKA:embedded_depth":"0","X-TIKA:parse_time_millis":"39","resourceName":"testHEIF.heic"}]%{code}
 

So is this expected that -j does not return anything while -m does. According 
to the cli docs -j just returns the metadata in json format (Output metadata in 
JSON).

 
----
 

*Question 2:* What is the rationale between the difference in Content-Type and 
mime-type?

I will be referring to question 1 cause if you see in the output of the -m and 
-J case it lists application/mp4 as Content-Type for the heic/heif file. Also 
if we use the server and ask for http://localhost:9998/meta/Content-Type we get 
back application/mp4. We would like to understand why you consider the 
Content-Type different then the mime-type. If we just only ask for the metadata 
(-m, -J or -j on the app jar and /meta on the server) it does not contain any 
information about the mime type at all and we cannot identify this file as 
image/heic. That is also why i intially created this ticket cause we where 
still getting application/mp4 back because of our usage of /meta instead of 
/detect/stream.

Can you please explain the rationale behind this difference cause the 
documentation does not really says anything about this. To us it dos not make 
sense at all that it would still handle heic/heif images as application/mp4 and 
you would need to use the cli or server differently to get correct detection.

 

 

> Heic images are detected as "application/mp4" when using tika as server
> -----------------------------------------------------------------------
>
>                 Key: TIKA-3007
>                 URL: https://issues.apache.org/jira/browse/TIKA-3007
>             Project: Tika
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 1.23
>            Reporter: Johan
>            Priority: Blocker
>
> Related to https://issues.apache.org/jira/browse/TIKA-2942
> It seems the detection of the heic imags is working for the standalone jar 
> (tika-app-1.23) but not for the server component (tika-server-1.23).
> tika-app-1.23.jar from [https://archive.apache.org/dist/tika/] detects the 
> image with image/heic but it does not work for the server component 
> tika-server-1.23.jar that one returns still "application/mp4". Any clue what 
> might be going wrong? Code has been added only to the tika jar client and not 
> to the server?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to