gruuya opened a new pull request, #2681:
URL: https://github.com/apache/arrow-rs/pull/2681

   # Which issue does this PR close?
   
   Closes #2288.
   
   # Rationale for this change
    
   I'm working on https://github.com/splitgraph/seafowl/pull/99, and I've been 
seeing sporadic multipart upload failures with error `Missing information for 
upload part x`. After a brief investigation I think the fix is quite simple, 
since the underlying problem stems from an incorrect assumption about the size 
of the completed parts vector. In other words, in `poll_tasks` of 
`CloudMultiPartUpload` we should get the size of the `completed_parts` vector 
(needed for resizing) for each iteration of the while loop, instead of 
calculating it prior to entering the loop.
   
   To demonstrate how this issue arises consider the following example:
   - there are no parts initially, so `completed_parts = []`
   - tasks for parts 0, 1, and 2 are created and at we enter poll_tasks; 
`total_parts = 0`
   - imagine parts 1, 0, and 2 finish, **in that order**
   - for part 1, we resize `completed_parts` to max(1 + 1, 0), and set the 
element at index 1 to the incoming part: `completed_parts = [None, 
Some(part_1)]`
   - critically, for part 0, we now resize `completed_parts` to max(0 + 1, 0), 
which actually means we **truncate** the last element (thus losing it), and 
then set the element at index 0: `completed_parts = [Some(part_0)]`
   - lastly, part 2 comes in; we resize `completed_parts` to max(2 + 1, 0), and 
fill in the last element: `completed_parts = [Some(part_0), None, Some(part_2)]`
   - once all other parts are completed, we go to `poll_shutdown` and the 
`None` above leads to `Missing information for upload part 1`
   
   # What changes are included in this PR?
   
   Get the size of the completed parts inside the while loop pooling the 
individual tasks, since it is changing with each iteration.
   
   # Are there any user-facing changes?
   
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to