[ 
https://issues.apache.org/jira/browse/TIKA-527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jukka Zitting resolved TIKA-527.
--------------------------------

       Resolution: Fixed
    Fix Version/s: 1.0
         Assignee: Jukka Zitting

Sorry for the long delay on this. I committed your patch with slight 
modifications in revision 1125596. Thanks!

I guess that's all there is for this issue, so resolving as fixed.

> Allow override mapping mime<-->parsers through config
> -----------------------------------------------------
>
>                 Key: TIKA-527
>                 URL: https://issues.apache.org/jira/browse/TIKA-527
>             Project: Tika
>          Issue Type: Improvement
>          Components: config
>    Affects Versions: 0.7
>            Reporter: Jan Høydahl
>            Assignee: Jukka Zitting
>             Fix For: 1.0
>
>         Attachments: TIKA-527.patch
>
>
> Background
> -----------------
> As of Tika 0.7, tika-config.xml is not longer mandatory and loading 3rd party 
> parsers as plugins through service architecture is supported.
> This introduces great flexibility, and even allows for extending Tika's file 
> format support by simply dropping in jar's on the classpath. This is great 
> for configuring Tika when it's embedded as part of another application such 
> as Solr or Nutch. You can easily add support for e.g. a commercial document 
> filter with Tika wrapper without changing Tika or the consuming application, 
> or even maintaining a tika-config.xml.
> This serves the majority of all use cases.
> Problem
> ------------
> However, as the variety of 3rd party document parsers increases, we'll start 
> seeing an overlap of parsers supporting the same mime-types. A very likely 
> scenario is a company specialized in document filters packaging their parsers 
> as a Tika plugin, under whatever license they choose.
> In this scenario, a system integrator (working with e.g. Solr) wants to 
> gather all the parsers that the particular customer needs, and then choose 
> which parser should handle each mime-type. She may want to let a 3rd party 
> parser plugin handle Word files but the Tika supplied POI parser handle Excel.
> Today, the last parser plugin that gets loaded by the class-loader happens to 
> "win" the mime-types it supports. As it is not uncommon for one parser to 
> register multiple mime-types, re-claiming a subset of the types is not 
> possible unless you are consuming Tika directly.
> We thus need an "override" mime-to-parser mapping by configuration, and Tika 
> needs to look for this config by default when starting.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to