Giorgiana Ciobanu created TIKA-3810:
---------------------------------------

             Summary: Vtt file (encoding UTF-8 with BOM) seen as text/plain
                 Key: TIKA-3810
                 URL: https://issues.apache.org/jira/browse/TIKA-3810
             Project: Tika
          Issue Type: Bug
          Components: core, detector, mime
    Affects Versions: 2.3.0
            Reporter: Giorgiana Ciobanu
         Attachments: s5_windowEncoding_validFormat.vtt

Vtt file created on Windows (UTF-8 {+}with BOM{+}) is incorrectly detected as 
_text/plain_ type and it should be _text/vtt_ .

The application using Tika and where the file is uploaded for mime type 
detection is an Unix machine. 

The vtt file is passed as inputstream to the Tika's default detector (we don't 
want to detect mime type by the file extension).

Please find attached the vtt file that Tika is detecting as text/plain .



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to