[
https://issues.apache.org/jira/browse/COMPRESS-485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16842022#comment-16842022
]
Hervé Boutemy commented on COMPRESS-485:
----------------------------------------
I didn't change anything in the parallel execution code: it works exactly the
same way it worked before =
1. entries are compressed to a temporary file per thread, stored as
{{List<ScatterZipOutputStream> streams}}:
https://github.com/apache/commons-compress/blob/rel/1.18/src/main/java/org/apache/commons/compress/archivers/zip/ParallelScatterZipCreator.java#L55
(initialized from ThreadLocal code:
https://github.com/apache/commons-compress/blob/rel/1.18/src/main/java/org/apache/commons/compress/archivers/zip/ParallelScatterZipCreator.java#L82
)
2. then temporary files are merged into the target full zip file
https://github.com/apache/commons-compress/blob/rel/1.18/src/main/java/org/apache/commons/compress/archivers/zip/ParallelScatterZipCreator.java#L257
I just used the fact that the entries to be compressed were perfectly kept in a
list = {{private final List<Future<Object>> futures = new ArrayList<>();}}: see
https://github.com/apache/commons-compress/blob/rel/1.18/src/main/java/org/apache/commons/compress/archivers/zip/ParallelScatterZipCreator.java#L58
de-facto, we have the order in memory, it was just not used when gathering
compressed content:
https://github.com/apache/commons-compress/blob/rel/1.18/src/main/java/org/apache/commons/compress/archivers/zip/ParallelScatterZipCreator.java#L257
then instead of iterating on {{ScatterZipOutputStream streams}}, I iterate on
the list:
https://github.com/apache/commons-compress/pull/78/commits/ee88360bb377d3816115dc33e2b999d1902034dc#diff-f65928086a3bd28a1beff1c6e37b7306R261
> Reproducible Builds: keep entries order when gathering ScatterZipOutputStream
> content in ParallelScatterZipCreator
> ------------------------------------------------------------------------------------------------------------------
>
> Key: COMPRESS-485
> URL: https://issues.apache.org/jira/browse/COMPRESS-485
> Project: Commons Compress
> Issue Type: Improvement
> Components: Archivers
> Affects Versions: 1.18
> Reporter: Hervé Boutemy
> Priority: Major
> Time Spent: 2h
> Remaining Estimate: 0h
>
> currently, zip files created using ParallelScatterZipCreator have random
> order.
> This is causing issues when trying to do Reproducible Builds with Maven
> MNG-6276
> Studying ParallelScatterZipCreator, entries are kept sorted in memory in
> futures list: instead of writing each full scatter in sequence, iterating
> over futures should permit to write each zip entry in original order, without
> changing the API or any performance of the gathering process
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)