[
https://issues.apache.org/jira/browse/COMPRESS-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13633496#comment-13633496
]
Dmitry Katsubo commented on COMPRESS-222:
-----------------------------------------
Thanks a lot for pointing the problem. That's not trivial, really.
I should have used {{IOUtils.readFully()}} for that.
> ZipArchiveInputStream may read incorrect bytes from stream when processing
> nested ZIP
> -------------------------------------------------------------------------------------
>
> Key: COMPRESS-222
> URL: https://issues.apache.org/jira/browse/COMPRESS-222
> Project: Commons Compress
> Issue Type: Bug
> Components: Archivers
> Affects Versions: 1.5
> Reporter: Dmitry Katsubo
> Labels: zip
> Fix For: 1.6
>
> Attachments: ArchiveTest.java, ArchiveTest.java,
> log_read_whole_entry.txt, log.txt, md5.correct.txt
>
>
> The problem is relevant to COMPRESS-189, in particular it relates to
> processing of inner ZIP files.
> Problem description:
> If the archive entry is not fully read, then partial reading returns
> incorrect contents.
> In particular the given example loops trough all entries of "09815141_4.zip"
> ZIP archive, probing each entry to be a TIFF file. The probe assumes that
> given file is TIFF, if it starts with bytes [0x49 0x49 0x2A 0x0 0x8 0x0 0x0
> 0x0 0x14 0x0].
> Most entries are correctly reported as TIFF, except:
> {code}
> [ArchiveTest] 000017.tif is something else
> [ArchiveTest] Header contents: 0x49 0x49 0x2A 0x0 0x8 0x0 0x0 0x0 0x0 0x0
> [ArchiveTest] 000033.tif is something else
> [ArchiveTest] Header contents: 0x49 0x49 0x2A 0x0 0x0 0x0 0x0 0x0 0x0 0x0
> [ArchiveTest] 000056.tif is something else
> [ArchiveTest] Header contents: 0x49 0x49 0x2A 0x0 0x8 0x0 0x0 0x0 0x0 0x0
> [ArchiveTest] 000069.tif is something else
> [ArchiveTest] Header contents: 0x49 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0
> {code}
> As I can see, the problem can be introduced at any random byte.
> If the program is set {{READ_WHOLE_ENTRY=true}} then all entries are fully
> read and MD5 sum is calculated. MD5 sum matches and all entries are correctly
> reported as TIFF. Thus the problem is only present when entry is not fully
> read and {{ArchiveInputStream.getNextEntry()}} is called.
> Test ZIP can be downloaded from:
> https://www.dropbox.com/s/h20wo6t0mwbgsqc/09815141_4.zip
> It was originally taken from WIPO FTP (i.e. it is in public domain) and was a
> bit stripped.
> Difficult to say what is the impact of this bug, but for 475 ZIP-in-ZIPs in
> my collection I have found 3 examples of incorrect contents extraction.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira