[ 
https://issues.apache.org/jira/browse/TIKA-402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martijn van Groningen updated TIKA-402:
---------------------------------------

    Attachment: iwork.patch

Jukka, I made some refactorings in the new attached patch in order the get rid 
of the IWorkRootElementDetectContentHandler class. Basically the IWorkParser 
only parses the relevant IWork xml files (i configured the xml documents to the 
parser with root-XML element). I created IWorkPackageParser class that deals 
with the container format file (*.keynote|pages|numbers). In this way if a 
IWork document is uncompressed or somehow put in a different archive file it 
can still be parsed.

> Support for iWork documents
> ---------------------------
>
>                 Key: TIKA-402
>                 URL: https://issues.apache.org/jira/browse/TIKA-402
>             Project: Tika
>          Issue Type: New Feature
>          Components: parser
>            Reporter: Jukka Zitting
>         Attachments: iwork.patch, iwork.patch, iwork.patch, iwork.patch, 
> iwork.patch, iwork.patch, testKeynote.key, testKeynote.key, 
> testNumbers.numbers, testPages.pages
>
>
> It would be nice to have support for documents created by Apple's Keynote and 
> Pages applications. Both file formats are described in 
> http://developer.apple.com/mac/library/documentation/AppleApplications/Conceptual/iWork2-0_XML/Chapter01/Introduction.html.
>  I'm not sure if there already are open source parser libraries for these 
> formats or if we'd need to directly process the XML content.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to