[
https://issues.apache.org/jira/browse/TIKA-790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157197#comment-13157197
]
Nick Burch commented on TIKA-790:
---------------------------------
One possible solution to the few extra types that POIFSDocumentType has (such
as Encrypted) is to add a parameter to the mimetype returned by
POIFSContainerDetector, eg for an Encrypted file return
"application/x-tika-msoffice; format=encrypted"
> Reduce duplication between POIFSDocumentType (in OfficeParser) and
> POIFSContainerDetector
> -----------------------------------------------------------------------------------------
>
> Key: TIKA-790
> URL: https://issues.apache.org/jira/browse/TIKA-790
> Project: Tika
> Issue Type: Improvement
> Components: parser
> Affects Versions: 1.0
> Reporter: Nick Burch
> Assignee: Nick Burch
>
> For historical reasons, we now have two parts of Tika that handle trying to
> identify the type of an OLE2 based file.
> POIFSDocumentType is able to detect a few kinds of files that
> POIFSContainerDetector is not able to (eg Encrypted and OLE Native), mostly
> which may not map well onto mimetypes. POIFSDocumentType also lacks some of
> the logic in the main detector, and only does the office parser supported
> files
> We should probably try to reduce the duplication. One option is to add the
> extra few types into the Detector some how, the other is to use the detector
> first and do additional specific checks after
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira