[
https://issues.apache.org/jira/browse/TIKA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Richard Davidson updated TIKA-2474:
-----------------------------------
Description:
When I try to detect the sub mime type for the attached keynote file I get
vnd.apple.unknown.13.
The file which handles the keynote files in Tika is
https://github.com/apache/tika/blob/master/tika-parsers/src/main/java/org/apache/tika/parser/iwork/iwana/IWork13PackageParser.java
and the specific code is:
{code}
public static MediaType detect(ZipFile zipFile) {
ZipArchiveEntry entry = zipFile.getEntry("Index/MasterSlide.iwa");
if (zipFile.getEntry("Index/MasterSlide.iwa") != null ||
zipFile.getEntry("Index/Slide.iwa") != null) {
return KEYNOTE13.getType();
}
//TODO: figure out how to distinguish numbers from pages
return UNKNOWN13.getType();
}
{code}
My file does not contain a Index/Slide.iwa or Index/MasterSlide.iwa but does
contain multiple files like: MasterSlide-3857.iwa and Slide-3885.iwa. I think
the detection logic should use a regex to check for MasterSlide-*-iwa or
Slide-*-iwa.
If people agree with this approach I can submit a pull request.
was:
When I try to detect the sub mime type for the attached keynote file I get
vnd.apple.unknown.13.
The file which handles the keynote files in Tikka is
https://github.com/apache/tika/blob/master/tika-parsers/src/main/java/org/apache/tika/parser/iwork/iwana/IWork13PackageParser.java
and the specific code is:
{code}
public static MediaType detect(ZipFile zipFile) {
ZipArchiveEntry entry = zipFile.getEntry("Index/MasterSlide.iwa");
if (zipFile.getEntry("Index/MasterSlide.iwa") != null ||
zipFile.getEntry("Index/Slide.iwa") != null) {
return KEYNOTE13.getType();
}
//TODO: figure out how to distinguish numbers from pages
return UNKNOWN13.getType();
}
{code}
My file does not contain a Index/Slide.iwa or Index/MasterSlide.iwa but doesn't
contain multiple files like: MasterSlide-3857.iwa and Slide-3885.iwa. I think
the detection logic should use a regex to check for MasterSlide-*-iwa or
Slide-*-iwa.
If people agree with this approach I can submit a pull request.
> Mime type should is vnd.apple.unknown.13 for valid keynote file
> ---------------------------------------------------------------
>
> Key: TIKA-2474
> URL: https://issues.apache.org/jira/browse/TIKA-2474
> Project: Tika
> Issue Type: Bug
> Reporter: Richard Davidson
> Attachments: Untitled.key
>
>
> When I try to detect the sub mime type for the attached keynote file I get
> vnd.apple.unknown.13.
> The file which handles the keynote files in Tika is
> https://github.com/apache/tika/blob/master/tika-parsers/src/main/java/org/apache/tika/parser/iwork/iwana/IWork13PackageParser.java
> and the specific code is:
> {code}
> public static MediaType detect(ZipFile zipFile) {
> ZipArchiveEntry entry = zipFile.getEntry("Index/MasterSlide.iwa");
> if (zipFile.getEntry("Index/MasterSlide.iwa") != null ||
> zipFile.getEntry("Index/Slide.iwa") != null) {
> return KEYNOTE13.getType();
> }
> //TODO: figure out how to distinguish numbers from pages
> return UNKNOWN13.getType();
> }
> {code}
> My file does not contain a Index/Slide.iwa or Index/MasterSlide.iwa but does
> contain multiple files like: MasterSlide-3857.iwa and Slide-3885.iwa. I think
> the detection logic should use a regex to check for MasterSlide-*-iwa or
> Slide-*-iwa.
> If people agree with this approach I can submit a pull request.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)