[
https://issues.apache.org/jira/browse/TIKA-447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036931#comment-13036931
]
Jukka Zitting commented on TIKA-447:
------------------------------------
I think we are pretty much done with this issue already.
Before closing this, I'd like to move the new classes from within o.a.t.detect
to appropriate o.a.t.parser subpackages in tika-parsers. That way the detection
logic is closer to the related parser classes and we don't have to worry about
split-package warnings from OSGi.
> Container aware mimetype detection
> ----------------------------------
>
> Key: TIKA-447
> URL: https://issues.apache.org/jira/browse/TIKA-447
> Project: Tika
> Issue Type: New Feature
> Components: mime
> Affects Versions: 0.7
> Reporter: Nick Burch
> Attachments: TIKA-447-TikaInputStream.patch,
> TikaContainerDetection.patch
>
>
> As discussed on the dev list, Tika should ideally have a configurable way to
> process container based formats (eg zip files and ole2 files) when trying to
> detect the correct mime type for a document.
> This needs to be configurable, because some people won't want Tika to have to
> do all the work of parsing the whole file when they're not interested in
> knowing exactly what's in it
> Once we have gone to the trouble of opening and parsing the container file,
> we should try to keep the open container around to speed up parsing of the
> contents
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira