[
https://issues.apache.org/jira/browse/HADOOP-15576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16556138#comment-16556138
]
Steve Loughran commented on HADOOP-15576:
-----------------------------------------
you're going to hate me for this, but here's my review, including going through
all of the base test class. I'll note that when that python spec of the
operations surfaces, the pre/post conditions of all the operations will show
which tests are needed to try and break things, and what exceptions are
expected on failure.
h3. {{S3AMultipartUploader}}
{{putPart}} to use {{writeOperations.newUploadPartRequest}}. (FWIW, just
realised the partID range check may be out of date w.r.t the latest AWS S3
limits. Feel free to remove that partID < 10000 check), as the far end will
reject anything out range, won't it?
h3. Dependencies on HDFS.
I've just seen that DfsUtilclient and various other HDFS imports are needed for
all this to work. As it is separate from S3A FS, it's probably OK to do this
*for now*, but really it should be using stuff from hadoop-common fs, not
hdfs-client. We should not require hdfs-client to be on the CP for S3 IO to
work. I don't see the POM explicitly pulling in HDFS except for testing, so it
must be coming in transiently from somewhere. Pause: looks like it came in with
the committer and its uprate of hadoop-mapreduce-client-core from test to
provided. Filed HADOOP-15631 and HDFS-13766 to cover this. I don't belive that
HD/Insights includes the HDFS JARs, so everything needed for multipart uploads
will need to move to the hadoop-common lib.
*It's too late to address in 3.2, but it does need fixing, initially within
HDFS*
h3. Tests
* {AbstractSystemMultipartUploaderTest}} should move to
{{AbstractFSContractTestBase}}. Gives you the timeouts, the thread naming, the
logic for unique paths in forks for parallel test runs, and a binding mechanism
which is used for all the contract tests.
* all asserts need to have meaningful messages so that failures in jenkins can
be tracked down.
* All uses of fs.open need to be closed in try/finally or try-with-resource. I
don't see IOUtils.toByteArray closing it. Better, use
{{ContractTestUtils.readUTF8}} and have do all the work, then you can just
compare strings (which gives you better text on a mismatch)
* And before you open the file, use {{ContractTestUtils.assertPathExists}}.
Why? Does a listing of the parent dir and includes it in assertion on a failure.
{{testMultipartUpload}} and {{testMultipartUploadReverseOrder(}}. If you aren't
using assertArrayEquals, you'll need a message for the size mismatch, ideally
including the difference. Best to convert the array to a string
{{org.apache.commons.collections.CollectionUtils}} or java8 streams and include
that in the message)
{{testMultipartUploadAbort}}: should use LambdaTestUtils.intercept, e.g.
{code}
intercept(FileNotFoundException.class, () -> fs.listStatus(file));
{code}
This will handle the rethrowing of any unexpected exception, and if
l-expression returns a value, will raise an assertion *including the string
value of the response. This also means you can remove the import of
{{junit.framework.TestCase.assertTrue}}, which is good as the assert is coming
from the old Junit classes.
{{testMultipartUploadAbort}} to try to {{complete()}} upload after the abort;
expect a failure. Then assert that the dest file doesn't exist with
{{{{ContractTestUtils.assertPathDoesNotExist}}
Extra tests to add.
- Add a test to try and abort an unknown upload
- test to try to complete unknown upload
- try to complete an upload twice
- try to abort an upload twice. What is expected here?
- initiate two MPUs to the same dest. What happens? (interesting q; outcome may
vary on FS)
- Initiate MPU to a directory; expect failure
- Ser/deser of an upload handle. Needed to detect accidental use of
non-serializable stuff within a single process.
- try to delete a path partway through an upload. Again, what happens?
> S3A Multipart Uploader to work with S3Guard and encryption
> -----------------------------------------------------------
>
> Key: HADOOP-15576
> URL: https://issues.apache.org/jira/browse/HADOOP-15576
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 3.2
> Reporter: Steve Loughran
> Assignee: Ewan Higgs
> Priority: Blocker
> Attachments: HADOOP-15576.001.patch
>
>
> The new Multipart Uploader API of HDFS-13186 needs to work with S3Guard, with
> the tests to demonstrate this
> # move from low-level calls of S3A client to calls of WriteOperationHelper;
> adding any new methods are needed there.
> # Tests. the tests of HDFS-13713.
> # test execution, with -DS3Guard, -DAuth
> There isn't an S3A version of {{AbstractSystemMultipartUploaderTest}}, and
> even if there was, it might not show that S3Guard was bypassed, because
> there's no checks that listFiles/listStatus shows the newly committed files.
> Similarly, because MPU requests are initiated in S3AMultipartUploader,
> encryption settings are't picked up. Files being uploaded this way *are not
> being encrypted*
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]