[
https://issues.apache.org/jira/browse/TIKA-402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Martijn van Groningen updated TIKA-402:
---------------------------------------
Attachment: testNumbers.numbers
iwork.patch
I've added Numbers support to the patch.
Pages and Numbers have both the same file content file (index.xml) in the
compressed file (*.pages / *.numbers). This makes detecting formats a bit more
difficult. I've solved this by comparing the root element which is different in
both formats. I'm not really happy with this solution, but it seems the only
solution. If someone has a nicer solution for this please share. A Keynote file
has index.apx1 as content file, which makes it much easier to determine the
format.
> Support for Keynote and Pages documents
> ---------------------------------------
>
> Key: TIKA-402
> URL: https://issues.apache.org/jira/browse/TIKA-402
> Project: Tika
> Issue Type: New Feature
> Components: parser
> Reporter: Jukka Zitting
> Attachments: iwork.patch, iwork.patch, iwork.patch, iwork.patch,
> iwork.patch, testKeynote.key, testKeynote.key, testNumbers.numbers,
> testPages.pages
>
>
> It would be nice to have support for documents created by Apple's Keynote and
> Pages applications. Both file formats are described in
> http://developer.apple.com/mac/library/documentation/AppleApplications/Conceptual/iWork2-0_XML/Chapter01/Introduction.html.
> I'm not sure if there already are open source parser libraries for these
> formats or if we'd need to directly process the XML content.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.