[jira] [Commented] (COMPRESS-485) Reproducible Builds: keep entries order when gathering ScatterZipOutputStream content in ParallelScatterZipCreator

JIRA Fri, 17 May 2019 02:08:37 -0700


    [ 
https://issues.apache.org/jira/browse/COMPRESS-485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16842022#comment-16842022
 ]


Hervé Boutemy commented on COMPRESS-485:
----------------------------------------

I didn't change anything in the parallel execution code: it works exactly the 
same way it worked before =
1. entries are compressed to a temporary file per thread, stored as 
{{List<ScatterZipOutputStream> streams}}: 
https://github.com/apache/commons-compress/blob/rel/1.18/src/main/java/org/apache/commons/compress/archivers/zip/ParallelScatterZipCreator.java#L55
 (initialized from ThreadLocal code: 
https://github.com/apache/commons-compress/blob/rel/1.18/src/main/java/org/apache/commons/compress/archivers/zip/ParallelScatterZipCreator.java#L82
 )
2. then temporary files are merged into the target full zip file 
https://github.com/apache/commons-compress/blob/rel/1.18/src/main/java/org/apache/commons/compress/archivers/zip/ParallelScatterZipCreator.java#L257

I just used the fact that the entries to be compressed were perfectly kept in a 
list = {{private final List<Future<Object>> futures = new ArrayList<>();}}: see 
https://github.com/apache/commons-compress/blob/rel/1.18/src/main/java/org/apache/commons/compress/archivers/zip/ParallelScatterZipCreator.java#L58

de-facto, we have the order in memory, it was just not used when gathering 
compressed content: 
https://github.com/apache/commons-compress/blob/rel/1.18/src/main/java/org/apache/commons/compress/archivers/zip/ParallelScatterZipCreator.java#L257

then instead of iterating on {{ScatterZipOutputStream streams}}, I iterate on 
the list: 
https://github.com/apache/commons-compress/pull/78/commits/ee88360bb377d3816115dc33e2b999d1902034dc#diff-f65928086a3bd28a1beff1c6e37b7306R261

> Reproducible Builds: keep entries order when gathering ScatterZipOutputStream 
> content in ParallelScatterZipCreator
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: COMPRESS-485
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-485
>             Project: Commons Compress
>          Issue Type: Improvement
>          Components: Archivers
>    Affects Versions: 1.18
>            Reporter: Hervé Boutemy
>            Priority: Major
>          Time Spent: 2h
>  Remaining Estimate: 0h
>
> currently, zip files created using ParallelScatterZipCreator have random 
> order.
> This is causing issues when trying to do Reproducible Builds with Maven 
> MNG-6276
> Studying ParallelScatterZipCreator, entries are kept sorted in memory in 
> futures list: instead of writing each full scatter in sequence, iterating 
> over futures should permit to write each zip entry in original order, without 
> changing the API or any performance of the gathering process



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (COMPRESS-485) Reproducible Builds: keep entries order when gathering ScatterZipOutputStream content in ParallelScatterZipCreator

Reply via email to