[ https://issues.apache.org/jira/browse/HADOOP-15576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16556138#comment-16556138 ]
Steve Loughran commented on HADOOP-15576: ----------------------------------------- you're going to hate me for this, but here's my review, including going through all of the base test class. I'll note that when that python spec of the operations surfaces, the pre/post conditions of all the operations will show which tests are needed to try and break things, and what exceptions are expected on failure. h3. {{S3AMultipartUploader}} {{putPart}} to use {{writeOperations.newUploadPartRequest}}. (FWIW, just realised the partID range check may be out of date w.r.t the latest AWS S3 limits. Feel free to remove that partID < 10000 check), as the far end will reject anything out range, won't it? h3. Dependencies on HDFS. I've just seen that DfsUtilclient and various other HDFS imports are needed for all this to work. As it is separate from S3A FS, it's probably OK to do this *for now*, but really it should be using stuff from hadoop-common fs, not hdfs-client. We should not require hdfs-client to be on the CP for S3 IO to work. I don't see the POM explicitly pulling in HDFS except for testing, so it must be coming in transiently from somewhere. Pause: looks like it came in with the committer and its uprate of hadoop-mapreduce-client-core from test to provided. Filed HADOOP-15631 and HDFS-13766 to cover this. I don't belive that HD/Insights includes the HDFS JARs, so everything needed for multipart uploads will need to move to the hadoop-common lib. *It's too late to address in 3.2, but it does need fixing, initially within HDFS* h3. Tests * {AbstractSystemMultipartUploaderTest}} should move to {{AbstractFSContractTestBase}}. Gives you the timeouts, the thread naming, the logic for unique paths in forks for parallel test runs, and a binding mechanism which is used for all the contract tests. * all asserts need to have meaningful messages so that failures in jenkins can be tracked down. * All uses of fs.open need to be closed in try/finally or try-with-resource. I don't see IOUtils.toByteArray closing it. Better, use {{ContractTestUtils.readUTF8}} and have do all the work, then you can just compare strings (which gives you better text on a mismatch) * And before you open the file, use {{ContractTestUtils.assertPathExists}}. Why? Does a listing of the parent dir and includes it in assertion on a failure. {{testMultipartUpload}} and {{testMultipartUploadReverseOrder(}}. If you aren't using assertArrayEquals, you'll need a message for the size mismatch, ideally including the difference. Best to convert the array to a string {{org.apache.commons.collections.CollectionUtils}} or java8 streams and include that in the message) {{testMultipartUploadAbort}}: should use LambdaTestUtils.intercept, e.g. {code} intercept(FileNotFoundException.class, () -> fs.listStatus(file)); {code} This will handle the rethrowing of any unexpected exception, and if l-expression returns a value, will raise an assertion *including the string value of the response. This also means you can remove the import of {{junit.framework.TestCase.assertTrue}}, which is good as the assert is coming from the old Junit classes. {{testMultipartUploadAbort}} to try to {{complete()}} upload after the abort; expect a failure. Then assert that the dest file doesn't exist with {{{{ContractTestUtils.assertPathDoesNotExist}} Extra tests to add. - Add a test to try and abort an unknown upload - test to try to complete unknown upload - try to complete an upload twice - try to abort an upload twice. What is expected here? - initiate two MPUs to the same dest. What happens? (interesting q; outcome may vary on FS) - Initiate MPU to a directory; expect failure - Ser/deser of an upload handle. Needed to detect accidental use of non-serializable stuff within a single process. - try to delete a path partway through an upload. Again, what happens? > S3A Multipart Uploader to work with S3Guard and encryption > ----------------------------------------------------------- > > Key: HADOOP-15576 > URL: https://issues.apache.org/jira/browse/HADOOP-15576 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 > Affects Versions: 3.2 > Reporter: Steve Loughran > Assignee: Ewan Higgs > Priority: Blocker > Attachments: HADOOP-15576.001.patch > > > The new Multipart Uploader API of HDFS-13186 needs to work with S3Guard, with > the tests to demonstrate this > # move from low-level calls of S3A client to calls of WriteOperationHelper; > adding any new methods are needed there. > # Tests. the tests of HDFS-13713. > # test execution, with -DS3Guard, -DAuth > There isn't an S3A version of {{AbstractSystemMultipartUploaderTest}}, and > even if there was, it might not show that S3Guard was bypassed, because > there's no checks that listFiles/listStatus shows the newly committed files. > Similarly, because MPU requests are initiated in S3AMultipartUploader, > encryption settings are't picked up. Files being uploaded this way *are not > being encrypted* -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org