Container aware mimetype detection
----------------------------------

                 Key: TIKA-447
                 URL: https://issues.apache.org/jira/browse/TIKA-447
             Project: Tika
          Issue Type: New Feature
          Components: mime
    Affects Versions: 0.7
            Reporter: Nick Burch
         Attachments: TikaContainerDetection.patch

As discussed on the dev list, Tika should ideally have a configurable way to 
process container based formats (eg zip files and ole2 files) when trying to 
detect the correct mime type for a document.

This needs to be configurable, because some people won't want Tika to have to 
do all the work of parsing the whole file when they're not interested in 
knowing exactly what's in it

Once we have gone to the trouble of opening and parsing the container file, we 
should try to keep the open container around to speed up parsing of the contents

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to