[jira] [Commented] (TIKA-1966) Issue in parsing iWorksDocument with Apache Tika

Tim Allison (JIRA) Wed, 04 May 2016 06:24:01 -0700

    [ 
https://issues.apache.org/jira/browse/TIKA-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15270609#comment-15270609
 ]


Tim Allison commented on TIKA-1966:
-----------------------------------

Thank you, Nick.  [~anjackson], via twitter, pointed out these links:

http://fileformats.archiveteam.org/wiki/IWork#iWork_2013

https://github.com/obriensp/iWorkFileFormat/blob/master/Docs/index.md#iwa

and according to http://fileformats.archiveteam.org/wiki/IWA

bq. However, the variant of Snappy that is used does not comply with the spec 
for that format, omitting the stream identifier and checksum.

which might explain why commons-compress doesn't read them.

> Issue in parsing iWorksDocument with Apache Tika
> ------------------------------------------------
>
>                 Key: TIKA-1966
>                 URL: https://issues.apache.org/jira/browse/TIKA-1966
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 1.12
>         Environment: Ubuntu 15
>            Reporter: Sachin Shaju
>         Attachments: budget.numbers, connors_20040127.key, pages.pages, 
> sample code
>
>
> I was trying to parse iWorksDoc with Apache Tika. But am not getting parsed 
> content as it is instead getting some other output from the content handler. 
> Code snippet that I've used is attached with this.
> Output :-
> Contents of the file :
> Index/Document.iwa
> Index/ViewState.iwa
> Index/CalculationEngine.iwa
> Index/Tables/HeaderStorageBucket-2.iwa
> Index/Tables/Tile.iwa
> Index/Metadata.iwa
> Metadata/Properties.plist
> I'm able to detect the file type using Detector api correctly. But am not 
> getting the useful content out of the document.
> I'm attaching the iWorks docs that I've tested with (made with latest version 
> of iOS). I got it working when testing with older versions. Thanks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TIKA-1966) Issue in parsing iWorksDocument with Apache Tika

Reply via email to