[ https://issues.apache.org/jira/browse/TIKA-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288086#comment-14288086 ]
Nick Burch commented on TIKA-1528: ---------------------------------- Ah, right, I think I get it. The SQLiteParser knows it has a table, and knows that needs the JDBCTableParser, but wants to go via an Extractor in order for each table to be treated individually if required, is that it? If so, since you control both ends, you could always cheat... Pop the table onto the TikaInputStream as an open container, provide an empty byte array as data, put the table mimetype on the metadata along with the table name as the resource name, hand that off to the EmbeddedDocumentExtractor, and wait for that special TikaInputStream to appear at the table parser. With no data, the other parsers will decline to do anything, so the mimetype on the metadata will win and your table parser will get the TikaInputStream + you then grab the real table details off the open container All depends on if you think a parser which didn't know about the special jdbc table connection thingy would ever be able to do something useful with the table or not? > Add an OverrideDetector that overrides other detectors > ------------------------------------------------------ > > Key: TIKA-1528 > URL: https://issues.apache.org/jira/browse/TIKA-1528 > Project: Tika > Issue Type: Improvement > Reporter: Tim Allison > Priority: Minor > > While working on TIKA-1511, I found a need to bypass our current detection > mechanism. I think that there are other use cases for this. The idea is > that a client or a tika-internal call wants to specify the Content-Type for a > document and bypass the regular mime detection chain. > We currently have the TypeDetector that returns the "Content-Type" as > specified in the Metadata, but there are two deficiencies in using that class > for this purpose: > * Content-Type is ambiguous, currently, when it comes into a Parser or > Detector, it could be used as a hint or as a direction. I'd like the > OverrideDetector to use a different metadata key from our usual "Content-Type. > * The ordering of the TypeDetector is based on alphabetic order of its class > name. I'd like the OverrideDetector to be run first and then short > circuit/bypass the other detectors. -- This message was sent by Atlassian JIRA (v6.3.4#6332)