Tim Allison created TIKA-2853:
---------------------------------
Summary: Consider applying NaiveBayes or similar simple ML to
streaming zip detector
Key: TIKA-2853
URL: https://issues.apache.org/jira/browse/TIKA-2853
Project: Tika
Issue Type: Task
Reporter: Tim Allison
Whether we use actual ml or build rules from patterns we see in the data, it
would be useful to gather features from field names, directory names, etc of
zipfile-based file types from our regression corpus to (potentially) improve
the efficiency of mime detection.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)