[ https://issues.apache.org/jira/browse/TIKA-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288297#comment-14288297 ]
Nick Burch commented on TIKA-1528: ---------------------------------- I'm struggling to visualise how this all fits together, sorry :( Maybe, if you wouldn't mind... On/from the main sql parser jira / svn branch / github branch, any chance you could do a bit of pseudo-code / stubbed out classes / description / etc of how all of these parts will fit together? Parser, detection, extractor, table parsers etc. That would certainly help me better see how it goes together; I might be able to offer some advice, but I'm fairly sure someone in our community will be able to offer great suggestions from it! > Add an OverrideDetector that overrides other detectors > ------------------------------------------------------ > > Key: TIKA-1528 > URL: https://issues.apache.org/jira/browse/TIKA-1528 > Project: Tika > Issue Type: Improvement > Reporter: Tim Allison > Priority: Minor > > While working on TIKA-1511, I found a need to bypass our current detection > mechanism. I think that there are other use cases for this. The idea is > that a client or a tika-internal call wants to specify the Content-Type for a > document and bypass the regular mime detection chain. > We currently have the TypeDetector that returns the "Content-Type" as > specified in the Metadata, but there are two deficiencies in using that class > for this purpose: > * Content-Type is ambiguous, currently, when it comes into a Parser or > Detector, it could be used as a hint or as a direction. I'd like the > OverrideDetector to use a different metadata key from our usual "Content-Type. > * The ordering of the TypeDetector is based on alphabetic order of its class > name. I'd like the OverrideDetector to be run first and then short > circuit/bypass the other detectors. -- This message was sent by Atlassian JIRA (v6.3.4#6332)