Tim Allison created TIKA-2853:
---------------------------------

             Summary: Consider applying NaiveBayes or similar simple ML to 
streaming zip detector
                 Key: TIKA-2853
                 URL: https://issues.apache.org/jira/browse/TIKA-2853
             Project: Tika
          Issue Type: Task
            Reporter: Tim Allison


Whether we use actual ml or build rules from patterns we see in the data, it 
would be useful to gather features from field names, directory names, etc of 
zipfile-based file types from our regression corpus to (potentially) improve 
the efficiency of mime detection. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to