[
https://issues.apache.org/jira/browse/TIKA-1522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804228#comment-16804228
]
Brian Jackson edited comment on TIKA-1522 at 3/28/19 7:38 PM:
--------------------------------------------------------------
I just ran into this same issue, I had an exe coming back as
{{application/x-msdownload; format=pe}} when I was expecting
{{application/x-dosexec}}, similar to the attached exe. I'm not sure what other
mime-tools out there do, but if you check the exe at
[https://htmlstrip.com/mime-file-type-checker] it comes back with
{{application/x-dosexec}}; they say they don't just look at the file extension,
but who knows.
Ultimately this was only an issue because we are using Tika to determine if
files are what they say they are (if you are uploading a txt file, is it
truthfully a text file?). I was going to use {{detect}} to detect the mime /
media type, then try to use the {{MimeType}} {{getExtensions()}} functionality
and cross reference that with the extension of the file to determine if the
file extension was within the extensions of the detected mime type. This would
cause problems validating certain exes, since the exe extension would not be in
the MimeType's extensions or any of the extensions of mime supertypes. I wonder
if the original suggestion on the ticket of _*.exe must be included in
application/x-msdownload glob pattern_ may solve this, because then it would be
understood that exe is a valid extension of {{application/x-msdownload}}.
was (Author: brian.jackson):
I just ran into this same issue, I had an exe coming back as
{{application/x-msdownload; format=pe}} when I was expecting
{{application/x-dosexec}}, similar to the attached exe. I'm not sure what other
mime-tools out there do, but if you check the exe at
[https://htmlstrip.com/mime-file-type-checker] it comes back with
{{application/x-dosexec}}; they say they don't just look at the file extension,
but who knows.
Ultimately this was only an issue because we are using Tika to determine if
files are what they say they are (if you are uploading a txt file, is it
truthfully a text file?). I was going to use detect to detect the mime / media
type, then try to use the {{MimeType}} {{getExtensions()}} functionality and
cross reference that with the extension of the file to determine if the file
extension was within the extensions of the detected mime type. This would cause
problems validating certain exes, since the exe extension would not be in the
MimeType's extensions or any of the extensions of mime supertypes. I wonder if
the original suggestion on the ticket of _*.exe must be included in
application/x-msdownload glob pattern_ may solve this, because then it would be
understood that exe is a valid extension of {{application/x-msdownload}}.
> Exe being detected as application/x-msdownload
> ----------------------------------------------
>
> Key: TIKA-1522
> URL: https://issues.apache.org/jira/browse/TIKA-1522
> Project: Tika
> Issue Type: Bug
> Components: mime
> Affects Versions: 1.7
> Reporter: Luis Filipe Nassif
> Priority: Minor
> Attachments: Search.exe
>
>
> If it is ok, *.exe must be included in application/x-msdownload glob pattern
> definitions. If it should be detected as application/x-dosexec, the hierarchy
> between application/x-dosexec, application/x-msdownload and PE based formats
> must be changed.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)