[
https://issues.apache.org/jira/browse/TIKA-620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Burch resolved TIKA-620.
-----------------------------
Resolution: Fixed
Fix Version/s: 1.0
Jukka's idea for parts 2 and 3 implemented in r1085003, along with unit tests.
> Parsers and non-canonical mimetypes
> -----------------------------------
>
> Key: TIKA-620
> URL: https://issues.apache.org/jira/browse/TIKA-620
> Project: Tika
> Issue Type: Improvement
> Components: parser
> Affects Versions: 0.9
> Reporter: Nick Burch
> Fix For: 1.0
>
>
> As discovered though TIKA-555, we weren't correctly handling the case of non
> canonical mimetypes properly
> There are three bits to this:
> * The default parser as created by TikaConfig.getDefaultConfig() needs the
> full mime registry passing in automatically, so it can walk the mime tree.
> Fixed in r1084801
> * The Composite Parser needs to handle child parsers that declare they
> support an alias rather than the canonical mimetype. Initial fix in r1084798,
> Jukka has an idea for a cleaner way
> * When Composite Parser looks for a parser for a mime type, it'll need to
> canonicalise this before checking. (this isn't a problem for the Auto Detect
> parser, but can be for others)
> We also probably need a couple more unit tests for this, as TIKA-555 had
> broken auto detect parsing of bmp (though not direct ImageParser), though
> none of our tests noticed...
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira