Steve created TIKA-3023:
---------------------------
Summary: Text files starting with MOVI are detected as X-SGI-Movie
Key: TIKA-3023
URL: https://issues.apache.org/jira/browse/TIKA-3023
Project: Tika
Issue Type: Bug
Affects Versions: 1.23
Environment: Issue recreated on
Windows 10 Professional 64bit running the runnable Jar
Ubuntu 16.04.6 LTS running Tika-Python
Reporter: Steve
Attachments: capitalmovie.txt
If a plaintext file starts with "MOVI" Tika labels it as an SGI Movie.
The hex conversion for MOVI is 4D 4F 56 49 which is the same as the header for
the SGI Movie file format
[https://reposcope.com/mimetype/video/x-sgi-movie]
This SGI format isn't supported so any information from a text file starting
like this would be lost. I've attached a simple file that should recreate the
problem.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)