[ 
https://issues.apache.org/jira/browse/TIKA-620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Burch resolved TIKA-620.
-----------------------------

       Resolution: Fixed
    Fix Version/s: 1.0

Jukka's idea for parts 2 and 3 implemented in r1085003, along with unit tests.

> Parsers and non-canonical mimetypes
> -----------------------------------
>
>                 Key: TIKA-620
>                 URL: https://issues.apache.org/jira/browse/TIKA-620
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 0.9
>            Reporter: Nick Burch
>             Fix For: 1.0
>
>
> As discovered though TIKA-555, we weren't correctly handling the case of non 
> canonical mimetypes properly
> There are three bits to this:
>  * The default parser as created by TikaConfig.getDefaultConfig() needs the 
> full mime registry passing in automatically, so it can walk the mime tree. 
> Fixed in r1084801
>  * The Composite Parser needs to handle child parsers that declare they 
> support an alias rather than the canonical mimetype. Initial fix in r1084798, 
> Jukka has an idea for a cleaner way
>  * When Composite Parser looks for a parser for a mime type, it'll need to 
> canonicalise this before checking. (this isn't a problem for the Auto Detect 
> parser, but can be for others)
> We also probably need a couple more unit tests for this, as TIKA-555 had 
> broken auto detect parsing of bmp (though not direct ImageParser), though 
> none of our tests noticed...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to