[
https://issues.apache.org/jira/browse/TIKA-533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12922038#action_12922038
]
Jukka Zitting commented on TIKA-533:
------------------------------------
The magic byte pattern we added in TIKA-402 for detecting iWork documents seems
to be too eager, as it matches also this document. Note that the test file
being itself a part of a zip archive makes no difference; it is detected as
application/vnd.apple.iwork even as a standalone document.
I removed the application/vnd.apple.iwork magic byte pattern in revision
1023712, which should solve your problem. It looks like we should instead use a
container-aware detector also for iWork, as I had to also disable a few iWork
test cases that would no longer correctly detect the format.
> Mis-detection of zip-within-zip as application/vnd.apple.iwork, with no
> output by CLI app
> -----------------------------------------------------------------------------------------
>
> Key: TIKA-533
> URL: https://issues.apache.org/jira/browse/TIKA-533
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 0.8
> Environment: Windows 7 64-bit, latest Tika build as of 18th Oct 2010.
> Reporter: Geoff Jarrad
> Attachments: zip-within-zip.zip
>
>
> It appears that, at least in some circumstances, a zip file containing only
> another zip file is being mis-detected as application/vnd.apple.iwork.
> In addition, for such files, the command-line parser does not return any
> output at all.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.