[ 
https://issues.apache.org/jira/browse/TIKA-402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12873708#action_12873708
 ] 

Jukka Zitting commented on TIKA-402:
------------------------------------

See revisions 949775, 949776 and 949795 for a few improvements I made.

I'm wondering if it would make sense to use our existing generic XML root 
element detection mechanism for these file formats, especially since it's 
possible for an iWork document to be stored as directory instead of as a zip 
archive. This way we wouldn't need the explicit 
IWorkRootElementDetectContentHandler class and IWorkParser would be just a 
special case of the PackageParser class. On the other hand we'd then need to 
turn the current format-specific content handlers into separate Tika Parser 
implementations. Not sure if the benefits are worth the trouble.

> Support for iWork documents
> ---------------------------
>
>                 Key: TIKA-402
>                 URL: https://issues.apache.org/jira/browse/TIKA-402
>             Project: Tika
>          Issue Type: New Feature
>          Components: parser
>            Reporter: Jukka Zitting
>         Attachments: iwork.patch, iwork.patch, iwork.patch, iwork.patch, 
> iwork.patch, testKeynote.key, testKeynote.key, testNumbers.numbers, 
> testPages.pages
>
>
> It would be nice to have support for documents created by Apple's Keynote and 
> Pages applications. Both file formats are described in 
> http://developer.apple.com/mac/library/documentation/AppleApplications/Conceptual/iWork2-0_XML/Chapter01/Introduction.html.
>  I'm not sure if there already are open source parser libraries for these 
> formats or if we'd need to directly process the XML content.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to