[ 
https://issues.apache.org/jira/browse/HADOOP-18706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17714111#comment-17714111
 ] 

ASF GitHub Bot commented on HADOOP-18706:
-----------------------------------------

steveloughran commented on PR #5563:
URL: https://github.com/apache/hadoop/pull/5563#issuecomment-1514770186

   OK. I think the design is incomplete as it is, you would really want to be a 
bit more sophisticated and
   * write the .pending multipart manifest to the temp dir as soon as the 
multipart is created
   * update it after every block is written
   * and don't allow more than one block to be written at a time (this is done 
for S3-CSE) already
   
   but this bit seems ready to go in, low risk and potentially useful for 
others.
   
   now, test policy.
   
   which aws s3 region did you run the full "mvn verify" tests for the 
hadoop-aws module, and what options did you have on the command line?




> The temporary files for disk-block buffer aren't unique enough to recover 
> partial uploads. 
> -------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-18706
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18706
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs/s3
>            Reporter: Chris Bevard
>            Priority: Minor
>              Labels: pull-request-available
>
> If an application crashes during an S3ABlockOutputStream upload, it's 
> possible to complete the upload if fast.upload.buffer is set to disk by 
> uploading the s3ablock file with putObject as the final part of the multipart 
> upload. If the application has multiple uploads running in parallel though 
> and they're on the same part number when the application fails, then there is 
> no way to determine which file belongs to which object, and recovery of 
> either upload is impossible.
> If the temporary file name for disk buffering included the s3 key, then every 
> partial upload would be recoverable.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to