[
https://issues.apache.org/jira/browse/TIKA-402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12873708#action_12873708
]
Jukka Zitting commented on TIKA-402:
------------------------------------
See revisions 949775, 949776 and 949795 for a few improvements I made.
I'm wondering if it would make sense to use our existing generic XML root
element detection mechanism for these file formats, especially since it's
possible for an iWork document to be stored as directory instead of as a zip
archive. This way we wouldn't need the explicit
IWorkRootElementDetectContentHandler class and IWorkParser would be just a
special case of the PackageParser class. On the other hand we'd then need to
turn the current format-specific content handlers into separate Tika Parser
implementations. Not sure if the benefits are worth the trouble.
> Support for iWork documents
> ---------------------------
>
> Key: TIKA-402
> URL: https://issues.apache.org/jira/browse/TIKA-402
> Project: Tika
> Issue Type: New Feature
> Components: parser
> Reporter: Jukka Zitting
> Attachments: iwork.patch, iwork.patch, iwork.patch, iwork.patch,
> iwork.patch, testKeynote.key, testKeynote.key, testNumbers.numbers,
> testPages.pages
>
>
> It would be nice to have support for documents created by Apple's Keynote and
> Pages applications. Both file formats are described in
> http://developer.apple.com/mac/library/documentation/AppleApplications/Conceptual/iWork2-0_XML/Chapter01/Introduction.html.
> I'm not sure if there already are open source parser libraries for these
> formats or if we'd need to directly process the XML content.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.