[ 
https://issues.apache.org/jira/browse/COMPRESS-574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17322748#comment-17322748
 ] 

Gaël Lalire commented on COMPRESS-574:
--------------------------------------

I will try be more concrete.

You have a ZIP Z containing 3 files (A, B, C), but you want to allow the user 
to choose which ones he wants.

So if he request download?zip_name=myzip&file_names=A,C

then your server will download Z, uncompress only A and C, and create a new ZIP 
containing only A and C.

It has a disk and proc usage cost to uncompress and compress again.

 

With my solution you don't store Z at all, instead you store A, B and C and 
possibly A_DEFLATED, B_DEFLATED and C_DEFLATED (if it is worth).

So if a user request download?zip_name=myzip&file_names=A,C

then your server will stream the ZIP, so only A (or A_DEFLATED) and C (or 
C_DEFLATED) will be fetch, not B content, and the content fetched does not need 
any buffer or disk space as it is a stream.

The byte range part is if the user do a

curl -r 200-500 download?zip_name=myzip&file_names=A,C

If the range between 200-500 is 10 bytes of the local header of C and 290 first 
bytes of C, then only C content will be fetch, A content is not needed.

 

In my case A,B,C are stored in Amazon S3 and I store file metadata (name,CRC32, 
size, deflated size) in an Oracle DB.

Here I put a file_names filter but the filter can be more complex, you can have 
user permission check, type filter ...

 

And the problems was too much I/O used for creating the filtered ZIP and 
sometimes not enough data space if too many simultaneous user.

 

 

> Byte range support in archive creation
> --------------------------------------
>
>                 Key: COMPRESS-574
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-574
>             Project: Commons Compress
>          Issue Type: Improvement
>          Components: Archivers
>            Reporter: Gaël Lalire
>            Priority: Minor
>         Attachments: DynamicZip.java, DynamicZipTest.java
>
>
> When you have a ZIP which contains _N_ components and you want to let the 
> user choose which components it needs, you need to create _2^N - 1_ ZIP.
> So the idea is to store each component once (or twice if you want both 
> deflated and stored version), and create the ZIP on the fly.
> For the moment you can stream with a ZipOutputStream but if you need an 
> InputStream things get a lot harder. I guess programs are writing the ZIP to 
> a file system and read from it after, so not really a streaming anymore.
> Also ZipOutputStream will never allow you to resume from a byte range, you 
> need to generate all previous data.
> So I made a class to do that, I think such functionality has its place in 
> commons compress.
> You can see my code attached and adapt it for better integration / other 
> archive type support or simply to get inspired.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to