Gregory Lepore created TIKA-4041:
------------------------------------
Summary: More rigorous file type checking for .arc files
Key: TIKA-4041
URL: https://issues.apache.org/jira/browse/TIKA-4041
Project: Tika
Issue Type: Improvement
Reporter: Gregory Lepore
Attachments: 315.ARC, ACCTG.ARC
I am seeing files with the .arc file extension being identfied as
application/x-internet-archive. However, if I'm reading the tika-mimetypes.xml
file correctly, they shouldn't be getting matched since they don't start with
filedesc://.
Is it possible there is an additional mimetype match somewhere?
Samples attached.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)