[ https://issues.apache.org/jira/browse/TIKA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Richard Davidson updated TIKA-2474: ----------------------------------- Priority: Minor (was: Major) > Mime type should is vnd.apple.unknown.13 for valid keynote file > --------------------------------------------------------------- > > Key: TIKA-2474 > URL: https://issues.apache.org/jira/browse/TIKA-2474 > Project: Tika > Issue Type: Bug > Reporter: Richard Davidson > Priority: Minor > Attachments: Untitled.key > > > When I try to detect the sub mime type for the attached keynote file I get > vnd.apple.unknown.13. > I think the code which handles the keynote files in Tika is > https://github.com/apache/tika/blob/master/tika-parsers/src/main/java/org/apache/tika/parser/iwork/iwana/IWork13PackageParser.java > and the specific code is: > {code} > public static MediaType detect(ZipFile zipFile) { > ZipArchiveEntry entry = zipFile.getEntry("Index/MasterSlide.iwa"); > if (zipFile.getEntry("Index/MasterSlide.iwa") != null || > zipFile.getEntry("Index/Slide.iwa") != null) { > return KEYNOTE13.getType(); > } > //TODO: figure out how to distinguish numbers from pages > return UNKNOWN13.getType(); > } > {code} > My file does not contain a Index/Slide.iwa or Index/MasterSlide.iwa but does > contain multiple files like: MasterSlide-3857.iwa and Slide-3885.iwa. I think > the detection logic should use a regex to check for MasterSlide-*-iwa or > Slide-*-iwa. > If people agree with this approach I can submit a pull request. -- This message was sent by Atlassian JIRA (v6.4.14#64029)