[ 
https://issues.apache.org/jira/browse/TIKA-3811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17562537#comment-17562537
 ] 

Nick Burch commented on TIKA-3811:
----------------------------------

You should not be using Apache Tika's detection for anything security related. 
We do not protect against people maliciously adding mime magic near the start 
of the file which still allows the underlying file to be processed by the 
correct application. We err on the side of giving a best-guess answer.

For the "what is this probably" case, Tika is great. For the "what parser is 
most likely to manage to get text out" case, Tika is great. For "what is this 
for certain even if it is malicious" you need a different tool for your 
detection.

See also 
[https://cwiki.apache.org/confluence/display/TIKA/The+Robustness+of+Apache+Tika]
 for advice on running Tika with untrusted input

> Exclude NameDetector not working for Tika.detect(file)
> ------------------------------------------------------
>
>                 Key: TIKA-3811
>                 URL: https://issues.apache.org/jira/browse/TIKA-3811
>             Project: Tika
>          Issue Type: Bug
>          Components: config, core, detector
>    Affects Versions: 2.3.0
>            Reporter: Giorgiana Ciobanu
>            Priority: Major
>         Attachments: invalid_format.vtt, tika-config_test.xml
>
>
> I need to detect mime type for a file but for security reason I want to 
> exclude the detection by file name extension. 
> I added a tika-config_test.xml (see attached) to my unit test but it still 
> detects file by name extension.
> I attached a test file that is wrongly detected as text/vtt because of the 
> file extension, it should be text/plain in this case.
>  
> The code of my unit test:
> {code:java}
> File file = new 
> File(getClass().getClassLoader().getResource("invalid_format.vtt").getFile());
> TikaConfig tikaConfig = new TikaConfig(this.getClass()
> .getClassLoader()
> .getResourceAsStream("tika-config_test.xml"));
>  
> // returns text/vtt but should be text/plain
> String mimeType = new Tika(tikaConfig).detect(file); 
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to