[
https://issues.apache.org/jira/browse/TIKA-3810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17562532#comment-17562532
]
Nick Burch commented on TIKA-3810:
----------------------------------
Looks like we had detection magic for the UTF16 variant BOMs but not the UTF8
one. Fixed in 9d928bbf9
> Vtt file (encoding UTF-8 with BOM) seen as text/plain
> -----------------------------------------------------
>
> Key: TIKA-3810
> URL: https://issues.apache.org/jira/browse/TIKA-3810
> Project: Tika
> Issue Type: Bug
> Components: core, detector, mime
> Affects Versions: 2.3.0
> Reporter: Giorgiana Ciobanu
> Priority: Major
> Attachments: s5_windowEncoding_validFormat.vtt
>
>
> Vtt file created on Windows (UTF-8 {+}with BOM{+}) is incorrectly detected as
> _text/plain_ type and it should be _text/vtt_ .
> The application using Tika and where the file is uploaded for mime type
> detection is an Unix machine.
> The vtt file is passed as inputstream to the Tika's default detector (we
> don't want to detect mime type by the file extension).
> Please find attached the vtt file that Tika is detecting as text/plain .
--
This message was sent by Atlassian Jira
(v8.20.10#820010)