Richard Davidson created TIKA-2474:
--------------------------------------

             Summary: Mime type should is vnd.apple.unknown.13 for valid 
keynote file
                 Key: TIKA-2474
                 URL: https://issues.apache.org/jira/browse/TIKA-2474
             Project: Tika
          Issue Type: Bug
            Reporter: Richard Davidson


When I try to detect the sub mime type for the attached keynote file I get 
vnd.apple.unknown.13. 

The file which handles the keynote files in Tikka is  
https://github.com/apache/tika/blob/master/tika-parsers/src/main/java/org/apache/tika/parser/iwork/iwana/IWork13PackageParser.java
 and the specific code is:

{code}
        public static MediaType detect(ZipFile zipFile) {
            ZipArchiveEntry entry = zipFile.getEntry("Index/MasterSlide.iwa");
            if (zipFile.getEntry("Index/MasterSlide.iwa") != null ||
                    zipFile.getEntry("Index/Slide.iwa") != null) {
                return KEYNOTE13.getType();
            }
            //TODO: figure out how to distinguish numbers from pages
            return UNKNOWN13.getType();
        }

{code}

My file does not contain a Index/Slide.iwa or Index/MasterSlide.iwa but doesn't 
contain multiple files like: MasterSlide-3857.iwa and Slide-3885.iwa. I think 
the detection logic should use a regex to check for MasterSlide-*-iwa or 
Slide-*-iwa. 

If people agree with this approach I can submit a pull request.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to