[jira] [Commented] (HADOOP-18706) The temporary files for disk-block buffer aren't unique enough to recover partial uploads.

ASF GitHub Bot (Jira) Thu, 27 Apr 2023 02:59:05 -0700


    [ 
https://issues.apache.org/jira/browse/HADOOP-18706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17717097#comment-17717097
 ]


ASF GitHub Bot commented on HADOOP-18706:
-----------------------------------------

steveloughran commented on code in PR #5563:
URL: https://github.com/apache/hadoop/pull/5563#discussion_r1178915082


##########
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3ABlockOutputArray.java:
##########
@@ -107,13 +108,18 @@ public void testDiskBlockCreate() throws IOException {
         
"very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__" +
         
"very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__" +
         "very_long_s3_key";
-    S3ADataBlocks.DataBlock dataBlock = diskBlockFactory.create("spanId", 
s3Key, 1,
-      getFileSystem().getDefaultBlockSize(), null);
-    LOG.info(dataBlock.toString()); // block file name and location can be 
viewed in failsafe-report
-
-    // delete the block file
-    dataBlock.innerClose();
-    diskBlockFactory.close();
+    long blockSize = getFileSystem().getDefaultBlockSize();
+    try (S3ADataBlocks.BlockFactory diskBlockFactory =
+           new S3ADataBlocks.DiskBlockFactory(getFileSystem());
+         S3ADataBlocks.DataBlock dataBlock =
+           diskBlockFactory.create("spanId", s3Key, 1, blockSize, null);
+    ) {
+      boolean created = Arrays.stream(
+        Objects.requireNonNull(new 
File(getConfiguration().get("hadoop.tmp.dir")).listFiles()))
+          .anyMatch(f -> f.getName().contains("very_long_s3_key"));
+      assertTrue(created);

Review Comment:
   add a message to print if the assert is false: we need to be able to start 
debugging without having to work back from the first line of the stack trace as 
to what went wrong. include that hadoop.tmp.dir value in the message too





> The temporary files for disk-block buffer aren't unique enough to recover 
> partial uploads. 
> -------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-18706
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18706
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs/s3
>            Reporter: Chris Bevard
>            Priority: Minor
>              Labels: pull-request-available
>
> If an application crashes during an S3ABlockOutputStream upload, it's 
> possible to complete the upload if fast.upload.buffer is set to disk by 
> uploading the s3ablock file with putObject as the final part of the multipart 
> upload. If the application has multiple uploads running in parallel though 
> and they're on the same part number when the application fails, then there is 
> no way to determine which file belongs to which object, and recovery of 
> either upload is impossible.
> If the temporary file name for disk buffering included the s3 key, then every 
> partial upload would be recoverable.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-18706) The temporary files for disk-block buffer aren't unique enough to recover partial uploads.

Reply via email to