[ 
https://issues.apache.org/jira/browse/COMPRESS-508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17074337#comment-17074337
 ] 

Stefan Bodewig commented on COMPRESS-508:
-----------------------------------------

What you describe is possible for cases where you can read an {{InputStream}} 
twice - it would be more tricky than you describe, though.

If you start looking for the central directory (the "real metadata") from the 
start of the stream you need to be careful to not be tricked by something that 
looks like a central directory. ZIPs stored in ZIPs are a real problem here. So 
you'd have to parse every metadata block you find and then keep the one that 
actually was located where the last "central directory locator" you find says 
it should be. This is why {{ZipFile}} searches for it from the end of the 
stream.

But yes, something that does two passes - first grab the central directory and 
a second pass that basically jumped forward in a way similar to what 
{{ZipFile.getEntriesInPhysicalOrder}} does - would be possible. We do not 
provide anything like this, though. Most people can read a stream only once or 
do have random access to the archive and {{ZipArchiveInputStream}} and 
{{ZipFile}} are tailored to these two situations. So far we have never thought 
about a situation where you can open a stream more than once - which probably 
means that this situation is not that common.

For some perspective of archive format history - most of the archive formats 
Commons Compress can read do not provide metadata for all entries at all. TAR 
and CPIO basically only store the metadata for a single entry followed by that 
entry. This is optimized for writing to streaming media where going back and 
inserting stuff is expensive or even impossible - think tapes. One format which 
has such a central meta data directory at the beginning is 7z, which also is 
the youngest of the formats. The basic structure of ZIP has been defined by 
Phil Katz more than three decades ago (where floppy disks have been the most 
modern medium and thus it supports multi volume archives).

> Bug: cannot get file size of ArchiveEntry using ZipArchiveInputStream
> ---------------------------------------------------------------------
>
>                 Key: COMPRESS-508
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-508
>             Project: Commons Compress
>          Issue Type: Bug
>          Components: Build
>    Affects Versions: 1.20
>         Environment: Android 9 and Android 10, on both emulator and real 
> device .
>            Reporter: AD_LB
>            Priority: Major
>         Attachments: 2020-03-31_20-53-36.png, 2020-04-01_18-28-19.mp4, 
> ZipTest.zip, ZipTest2.zip, test.zip
>
>
> I'm trying to use ZipArchiveInputStream to iterate over the items of a zip 
> file (which may or may not be a real file on the file-system, which is why I 
> use a stream), optionally creating a stream from specific entries.
> One of the operations I need is to get the size of the files within.
> For some reason, it fails to do so. Not only that, but it throws an exception 
> when I'm done with it:
> {code:java}
> Error:org.apache.commons.compress.archivers.zip.UnsupportedZipFeatureException:
>  Unsupported feature data descriptor used in entry ...
> {code}
> I've attached here 3 files:sample project, the problematic zip file (remember 
> that you need to put it in the correct path and grant storage permission), 
> and a screenshot of the issue.
> Note that if I open the file using a third party PC app (such as 
> [7-zip|https://www.7-zip.org/]  ), it works fine, including showing the file 
> size inside.
> Files:
> !2020-03-31_20-53-36.png![^test.zip]
> [^ZipTest.zip]
> Here's the relevant code (kotlin) :
>  
> {code:java}
>         thread {
>             try {
>                 val file = File("/storage/emulated/0/test.zip")
>                 ZipArchiveInputStream(FileInputStream(file)).use {
>                     while (true) {
>                         val entry = it.nextEntry ?: break
>                         Log.d("AppLog", "entry:${entry.name} ${entry.size} ")
>                     }
>                 }
>                 Log.d("AppLog", "got archive ")
>             } catch (e: Exception) {
>                 Log.d("AppLog", "Error:$e")
>                 e.printStackTrace()
>             }
>         }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to