[jira] [Commented] (TIKA-1528) Add an OverrideDetector that overrides other detectors

Nick Burch (JIRA) Thu, 22 Jan 2015 14:06:12 -0800

    [ 
https://issues.apache.org/jira/browse/TIKA-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288297#comment-14288297
 ]


Nick Burch commented on TIKA-1528:
----------------------------------

I'm struggling to visualise how this all fits together, sorry :(

Maybe, if you wouldn't mind... On/from the main sql parser jira / svn branch / 
github branch, any chance you could do a bit of pseudo-code / stubbed out 
classes / description / etc of how all of these parts will fit together? 
Parser, detection, extractor, table parsers etc. That would certainly help me 
better see how it goes together; I might be able to offer some advice, but I'm 
fairly sure someone in our community will be able to offer great suggestions 
from it!

> Add an OverrideDetector that overrides other detectors
> ------------------------------------------------------
>
>                 Key: TIKA-1528
>                 URL: https://issues.apache.org/jira/browse/TIKA-1528
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Tim Allison
>            Priority: Minor
>
> While working on TIKA-1511, I found a need to bypass our current detection 
> mechanism.  I think that there are other use cases for this.  The idea is 
> that a client or a tika-internal call wants to specify the Content-Type for a 
> document and bypass the regular mime detection chain.
> We currently have the TypeDetector that returns the "Content-Type" as 
> specified in the Metadata, but there are two deficiencies in using that class 
> for this purpose:
> * Content-Type is ambiguous, currently, when it comes into a Parser or 
> Detector, it could be used as a hint or as a direction.  I'd like the 
> OverrideDetector to use a different metadata key from our usual "Content-Type.
> * The ordering of the TypeDetector is based on alphabetic order of its class 
> name.  I'd like the OverrideDetector to be run first and then short 
> circuit/bypass the other detectors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TIKA-1528) Add an OverrideDetector that overrides other detectors

Reply via email to