[
https://issues.apache.org/jira/browse/TIKA-402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Martijn van Groningen updated TIKA-402:
---------------------------------------
Attachment: iwork.patch
I've attached a new patch that fixes the failure in test in a Java 5
environment. Apparently the following piece of text was given by the java 5
parser in two event s:
Both Pages 1.x and Keynote 2.x
In the content handlers the trim function was used which resulted in the
following line:
Both Pages 1.xand Keynote 2.x
This caused the assertion failure. I've removed all the usage of trim in the
content handlers, which fixes the problem. Test passes as well when JDK 6 is
used. The usage of trim was apparently not necessary. The text is put correctly
in the XHTMLContentHandler.
> Support for iWork documents
> ---------------------------
>
> Key: TIKA-402
> URL: https://issues.apache.org/jira/browse/TIKA-402
> Project: Tika
> Issue Type: New Feature
> Components: parser
> Reporter: Jukka Zitting
> Assignee: Jukka Zitting
> Fix For: 0.8
>
> Attachments: iwork.patch, iwork.patch, iwork.patch, iwork.patch,
> iwork.patch, iwork.patch, iwork.patch, testKeynote.key, testKeynote.key,
> testNumbers.numbers, testPages.pages
>
>
> It would be nice to have support for documents created by Apple's Keynote and
> Pages applications. Both file formats are described in
> http://developer.apple.com/mac/library/documentation/AppleApplications/Conceptual/iWork2-0_XML/Chapter01/Introduction.html.
> I'm not sure if there already are open source parser libraries for these
> formats or if we'd need to directly process the XML content.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.