[jira] [Commented] (HADOOP-18706) The temporary files for disk-block buffer aren't unique enough to recover partial uploads.

ASF GitHub Bot (Jira) Wed, 26 Apr 2023 11:05:04 -0700


    [ 
https://issues.apache.org/jira/browse/HADOOP-18706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17716847#comment-17716847
 ]


ASF GitHub Bot commented on HADOOP-18706:
-----------------------------------------

cbevard1 commented on code in PR #5563:
URL: https://github.com/apache/hadoop/pull/5563#discussion_r1178227622


##########
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3ABlockOutputArray.java:
##########
@@ -79,6 +80,42 @@ public void testRegularUpload() throws IOException {
     verifyUpload("regular", 1024);
   }
 
+  /**
+   * Test that the DiskBlock's local file doesn't result in error when the S3 
key exceeds the max
+   * char limit of the local file system. Currently
+   * {@link java.io.File#createTempFile(String, String, File)} is being relied 
on to handle the
+   * truncation.
+   * @throws IOException
+   */
+  @Test
+  public void testDiskBlockCreate() throws IOException {
+    S3ADataBlocks.BlockFactory diskBlockFactory =
+      new S3ADataBlocks.DiskBlockFactory(getFileSystem());
+    String s3Key = // 1024 char
+      
"very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__" +
+        
"very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__" +
+        
"very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__" +
+        
"very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__" +
+        
"very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__" +
+        
"very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__" +
+        
"very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__" +
+        
"very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__" +
+        
"very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__" +
+        
"very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__" +
+        
"very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__" +
+        
"very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__" +
+        
"very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__" +
+        
"very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__" +
+        "very_long_s3_key";
+    S3ADataBlocks.DataBlock dataBlock = diskBlockFactory.create("spanId", 
s3Key, 1,
+      getFileSystem().getDefaultBlockSize(), null);
+    LOG.info(dataBlock.toString()); // block file name and location can be 
viewed in failsafe-report
+
+    // delete the block file
+    dataBlock.innerClose();

Review Comment:
   I added an assertion to make sure the tmp file is created.





> The temporary files for disk-block buffer aren't unique enough to recover 
> partial uploads. 
> -------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-18706
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18706
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs/s3
>            Reporter: Chris Bevard
>            Priority: Minor
>              Labels: pull-request-available
>
> If an application crashes during an S3ABlockOutputStream upload, it's 
> possible to complete the upload if fast.upload.buffer is set to disk by 
> uploading the s3ablock file with putObject as the final part of the multipart 
> upload. If the application has multiple uploads running in parallel though 
> and they're on the same part number when the application fails, then there is 
> no way to determine which file belongs to which object, and recovery of 
> either upload is impossible.
> If the temporary file name for disk buffering included the s3 key, then every 
> partial upload would be recoverable.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-18706) The temporary files for disk-block buffer aren't unique enough to recover partial uploads.

Reply via email to