[
https://issues.apache.org/jira/browse/TIKA-402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Martijn van Groningen updated TIKA-402:
---------------------------------------
Attachment: iwork.patch
Jukka, I made some refactorings in the new attached patch in order the get rid
of the IWorkRootElementDetectContentHandler class. Basically the IWorkParser
only parses the relevant IWork xml files (i configured the xml documents to the
parser with root-XML element). I created IWorkPackageParser class that deals
with the container format file (*.keynote|pages|numbers). In this way if a
IWork document is uncompressed or somehow put in a different archive file it
can still be parsed.
> Support for iWork documents
> ---------------------------
>
> Key: TIKA-402
> URL: https://issues.apache.org/jira/browse/TIKA-402
> Project: Tika
> Issue Type: New Feature
> Components: parser
> Reporter: Jukka Zitting
> Attachments: iwork.patch, iwork.patch, iwork.patch, iwork.patch,
> iwork.patch, iwork.patch, testKeynote.key, testKeynote.key,
> testNumbers.numbers, testPages.pages
>
>
> It would be nice to have support for documents created by Apple's Keynote and
> Pages applications. Both file formats are described in
> http://developer.apple.com/mac/library/documentation/AppleApplications/Conceptual/iWork2-0_XML/Chapter01/Introduction.html.
> I'm not sure if there already are open source parser libraries for these
> formats or if we'd need to directly process the XML content.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.