[jira] [Work logged] (COMPRESS-485) Reproducible Builds: keep entries order when gathering ScatterZipOutputStream content in ParallelScatterZipCreator

ASF GitHub Bot (JIRA) Thu, 08 Aug 2019 02:16:08 -0700


     [ 
https://issues.apache.org/jira/browse/COMPRESS-485?focusedWorklogId=291094&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-291094
 ]


ASF GitHub Bot logged work on COMPRESS-485:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 08/Aug/19 09:15
            Start Date: 08/Aug/19 09:15
    Worklog Time Spent: 10m 
      Work Description: bodewig commented on issue #79: Pull/78  COMPRESS-485 + 
Substituting 'synchronized' with faster and fully thread-safe collections 
'ConcurrentLinkedDeque' and iterators.
URL: https://github.com/apache/commons-compress/pull/79#issuecomment-519438687
 
 
   Thanks @Tibor17 
   
   Actually I've more or less been thinking out loud and not raising issues.
   
   I was trying to figure out whether the `synchronized` usage was giving the 
API users - or our code - any extra guarantees that the non-blocking collection 
code didn't. Keep in mind that I'm not the author of the original code either. 
I am the one who added the `synchronized` around the iteration over `streams` - 
but completely overlooked to the iteration over `futures` before that. Doesn't 
sound as if I was the most qualified person to comment ;-)
   
   What I meant with the first part was that if you added new threads once 
`writeTo` was underway then whatever your new threads contributed would not be 
part of the result - while it now is undefined. Looking at the current code in 
master I see I've been wrong as `writeTo` does quite a few things before 
entering the synchronized block and there is enough leeway anyway.
   
   No, I don't think we need to exclude the methods from each other. The class 
has a very clear usage pattern of two distinct phases:
   
   1. add all the things you want to add
   2. call `writeTo`
   
   and it should be clear that the result of calling `writeTo` before you are 
done with the first phase is a dubious idea. In particular as the javadocs of 
`writeTo` state it will shut down the executor.
   
   I am aware of the iteration guarantees of `ConcurrentLinkedDeque`. Back when 
Kristian added the parallel zip support Commons Compress' baseline has been 
Java 5 (we bumped that to 6 with Compress 1.12 and 7 in Compress 1.13, which is 
where we are today). If it had been Java 7 back then, then I'm sure Kristian 
would have used non-blocking collections instead.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 291094)
    Time Spent: 3.5h  (was: 3h 20m)

> Reproducible Builds: keep entries order when gathering ScatterZipOutputStream 
> content in ParallelScatterZipCreator
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: COMPRESS-485
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-485
>             Project: Commons Compress
>          Issue Type: Improvement
>          Components: Archivers
>    Affects Versions: 1.18
>            Reporter: Hervé Boutemy
>            Priority: Major
>             Fix For: 1.19
>
>          Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> currently, zip files created using ParallelScatterZipCreator have random 
> order.
> This is causing issues when trying to do Reproducible Builds with Maven 
> MNG-6276
> Studying ParallelScatterZipCreator, entries are kept sorted in memory in 
> futures list: instead of writing each full scatter in sequence, iterating 
> over futures should permit to write each zip entry in original order, without 
> changing the API or any performance of the gathering process



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Work logged] (COMPRESS-485) Reproducible Builds: keep entries order when gathering ScatterZipOutputStream content in ParallelScatterZipCreator

Reply via email to