[ 
https://issues.apache.org/jira/browse/TIKA-1522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804228#comment-16804228
 ] 

Brian Jackson edited comment on TIKA-1522 at 3/28/19 7:38 PM:
--------------------------------------------------------------

I just ran into this same issue, I had an exe coming back as 
{{application/x-msdownload; format=pe}} when I was expecting 
{{application/x-dosexec}}, similar to the attached exe. I'm not sure what other 
mime-tools out there do, but if you check the exe at 
[https://htmlstrip.com/mime-file-type-checker] it comes back with 
{{application/x-dosexec}}; they say they don't just look at the file extension, 
but who knows.

Ultimately this was only an issue because we are using Tika to determine if 
files are what they say they are (if you are uploading a txt file, is it 
truthfully a text file?). I was going to use {{detect}} to detect the mime / 
media type, then try to use the {{MimeType}} {{getExtensions()}} functionality 
and cross reference that with the extension of the file to determine if the 
file extension was within the extensions of the detected mime type. This would 
cause problems validating certain exes, since the exe extension would not be in 
the MimeType's extensions or any of the extensions of mime supertypes. I wonder 
if the original suggestion on the ticket of _*.exe must be included in 
application/x-msdownload glob pattern_ may solve this, because then it would be 
understood that exe is a valid extension of {{application/x-msdownload}}.


was (Author: brian.jackson):
I just ran into this same issue, I had an exe coming back as 
{{application/x-msdownload; format=pe}} when I was expecting 
{{application/x-dosexec}}, similar to the attached exe. I'm not sure what other 
mime-tools out there do, but if you check the exe at 
[https://htmlstrip.com/mime-file-type-checker] it comes back with 
{{application/x-dosexec}}; they say they don't just look at the file extension, 
but who knows.

Ultimately this was only an issue because we are using Tika to determine if 
files are what they say they are (if you are uploading a txt file, is it 
truthfully a text file?). I was going to use detect to detect the mime / media 
type, then try to use the {{MimeType}} {{getExtensions()}} functionality and 
cross reference that with the extension of the file to determine if the file 
extension was within the extensions of the detected mime type. This would cause 
problems validating certain exes, since the exe extension would not be in the 
MimeType's extensions or any of the extensions of mime supertypes. I wonder if 
the original suggestion on the ticket of _*.exe must be included in 
application/x-msdownload glob pattern_ may solve this, because then it would be 
understood that exe is a valid extension of {{application/x-msdownload}}.

> Exe being detected as application/x-msdownload
> ----------------------------------------------
>
>                 Key: TIKA-1522
>                 URL: https://issues.apache.org/jira/browse/TIKA-1522
>             Project: Tika
>          Issue Type: Bug
>          Components: mime
>    Affects Versions: 1.7
>            Reporter: Luis Filipe Nassif
>            Priority: Minor
>         Attachments: Search.exe
>
>
> If it is ok, *.exe must be included in application/x-msdownload glob pattern 
> definitions. If it should be detected as application/x-dosexec, the hierarchy 
> between application/x-dosexec, application/x-msdownload and PE based formats 
> must be changed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to