[ 
https://issues.apache.org/jira/browse/HADOOP-15576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16556138#comment-16556138
 ] 

Steve Loughran commented on HADOOP-15576:
-----------------------------------------

you're going to hate me for this, but here's my review, including going through 
all of the base test class. I'll note that when that python spec of the 
operations surfaces, the pre/post conditions of all the operations will show 
which tests are needed to try and break things, and what exceptions are 
expected on failure. 


h3. {{S3AMultipartUploader}}

{{putPart}} to use {{writeOperations.newUploadPartRequest}}. (FWIW, just 
realised the partID range check may be out of date w.r.t the latest AWS S3 
limits. Feel free to remove that partID < 10000 check), as the far end will 
reject anything out range, won't it?


h3. Dependencies on HDFS. 

I've just seen that DfsUtilclient and various other HDFS imports are needed for 
all this to work. As it is separate from S3A FS, it's probably OK to do this 
*for now*, but really it should be using stuff from hadoop-common fs, not 
hdfs-client. We should not require hdfs-client to be on the CP for S3 IO to 
work. I don't see the POM explicitly pulling in HDFS except for testing, so it 
must be coming in transiently from somewhere. Pause: looks like it came in with 
the committer and its uprate of hadoop-mapreduce-client-core from test to 
provided. Filed HADOOP-15631 and HDFS-13766 to cover this. I don't belive that 
HD/Insights includes the HDFS JARs, so everything needed for multipart uploads 
will need to move to the hadoop-common lib.

*It's too late to address in 3.2, but it does need fixing, initially within 
HDFS*

h3. Tests

* {AbstractSystemMultipartUploaderTest}} should move to 
{{AbstractFSContractTestBase}}. Gives you the timeouts, the thread naming, the 
logic for unique paths in forks for parallel test runs, and a binding mechanism 
which is used for all the contract tests.
* all asserts need to have meaningful messages so that failures in jenkins can 
be tracked down. 
* All uses of fs.open need to be closed in try/finally or try-with-resource. I 
don't see IOUtils.toByteArray closing it. Better, use 
{{ContractTestUtils.readUTF8}} and have do all the work, then you can just 
compare strings (which gives you better text on a mismatch)
* And before you open the file, use {{ContractTestUtils.assertPathExists}}. 
Why? Does a listing of the parent dir and includes it in assertion on a failure.

{{testMultipartUpload}} and {{testMultipartUploadReverseOrder(}}. If you aren't 
using assertArrayEquals, you'll need a message for the size mismatch, ideally 
including the difference. Best to convert the array to a string 
{{org.apache.commons.collections.CollectionUtils}} or java8 streams and include 
that in the message)

{{testMultipartUploadAbort}}: should use LambdaTestUtils.intercept, e.g.

{code}
intercept(FileNotFoundException.class, () -> fs.listStatus(file));
{code}

This will handle the rethrowing of any unexpected exception, and if 
l-expression returns a value, will raise an assertion *including the string 
value of the response. This also means you can remove the import of 
{{junit.framework.TestCase.assertTrue}}, which is good as the assert is coming 
from the old Junit classes. 

{{testMultipartUploadAbort}} to try to {{complete()}} upload after the abort; 
expect a failure. Then assert that the dest file doesn't exist with 
{{{{ContractTestUtils.assertPathDoesNotExist}}


Extra tests to add.

- Add a test to try and abort an unknown upload
- test to try to complete unknown upload
- try to complete an upload twice
- try to abort an upload twice. What is expected here?
- initiate two MPUs to the same dest. What happens? (interesting q; outcome may 
vary on FS)
- Initiate MPU to a directory; expect failure
- Ser/deser of an upload handle. Needed to detect accidental use of 
non-serializable stuff within a single process.
- try to delete a path partway through an upload. Again, what happens?



> S3A  Multipart Uploader to work with S3Guard and encryption
> -----------------------------------------------------------
>
>                 Key: HADOOP-15576
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15576
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.2
>            Reporter: Steve Loughran
>            Assignee: Ewan Higgs
>            Priority: Blocker
>         Attachments: HADOOP-15576.001.patch
>
>
> The new Multipart Uploader API of HDFS-13186 needs to work with S3Guard, with 
> the tests to demonstrate this
> # move from low-level calls of S3A client to calls of WriteOperationHelper; 
> adding any new methods are needed there.
> # Tests. the tests of HDFS-13713. 
> # test execution, with -DS3Guard, -DAuth
> There isn't an S3A version of {{AbstractSystemMultipartUploaderTest}}, and 
> even if there was, it might not show that S3Guard was bypassed, because 
> there's no checks that listFiles/listStatus shows the newly committed files.
> Similarly, because MPU requests are initiated in S3AMultipartUploader, 
> encryption settings are't picked up. Files being uploaded this way *are not 
> being encrypted*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to