steveloughran commented on a change in pull request #609: HADOOP-16193. add
extra S3A MPU test to see what happens if a file is created during the MPU
URL: https://github.com/apache/hadoop/pull/609#discussion_r279073402
##########
File path:
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractMultipartUploader.java
##########
@@ -159,4 +170,47 @@ public void testDirectoryInTheWay() throws Exception {
public void testMultipartUploadReverseOrder() throws Exception {
ContractTestUtils.skip("skipped for speed");
}
+
+ /**
+ * This creates and then deletes a zero-byte file while an upload
+ * is in progress, and verifies that the uploaded file is ultimately
+ * visible.
+ */
+ @Test
+ public void testMultipartOverlapWithTransientFile() throws Throwable {
+ // until there's a way to explicitly ask for a multipart uploader from a
+ // specific FS, explicitly create one bonded to the raw FS.
+ describe("testMultipartOverlapWithTransientFile");
+ S3AFileSystem fs = getFileSystem();
+ Path path = path("testMultipartOverlapWithTransientFile");
+ fs.delete(path, true);
+ MultipartUploader mpu = mpu(1);
+ UploadHandle upload1 = mpu.initialize(path);
+ byte[] dataset = dataset(1024, '0', 10);
+ final Map<Integer, PartHandle> handles = new HashMap<>();
+ LOG.info("Uploading multipart entry");
+ PartHandle value = mpu.putPart(path, new ByteArrayInputStream(dataset), 1,
+ upload1,
+ dataset.length);
+ // upload 1K
+ handles.put(1, value);
+ // confirm the path is absent
+ ContractTestUtils.assertPathDoesNotExist(fs,
+ "path being uploaded", path);
+ // now create an empty file
+ ContractTestUtils.touch(fs, path);
+ final FileStatus touchStatus = fs.getFileStatus(path);
+ LOG.info("0-byte file has been created: {}", touchStatus);
+ fs.delete(path, false);
+ // now complete the upload
+ mpu.complete(path, handles, upload1);
Review comment:
I don't think I fully understand pathhandles. I thought that was an HDFS
thing only right, or is there something obvious I haven't noticed?
Pauses, looks at code. I see: S3A MPU returns the etag, which isn't enough
to refer to it when reopening.
We've been doing work on S3 & versioning now: one thing I'd like to get back
from the upload is a reference which can be guaranteed to open that specific
file, or failfast if it has changed; something like (path, etag, version).
Returning these immediately from the upload in an S3AFileStatus would be one
approach, but a pathhandle for S3A which contained all of those would be
exactly what someone needs, isn't it?
Do you fancy getting busy on that?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]