[
https://issues.apache.org/jira/browse/HADOOP-13164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Steve Loughran updated HADOOP-13164:
------------------------------------
Attachment: HADOOP-13164-branch-005.patch
# using {{ArrayList<>}} for list; no need to go near guava if we can help it.
# tuned loop to not be a while true + break on root, but just be a
{{while(!isRootPath(f))}}
# moved check for an empty list into {{removeKeys()}}, so it is everywhere.
# cut test out of {{ITestS3AContractRename}}. This test is *only* for testing
compliance with the FS spec, and were it not for consistency problems
surfacing, should have no test cases at all, just a materialized subclass of
{{AbstractContractRenameTest}}
# moved this test in {{ITestS3AFileOperationCost.testFakeDirectoryDeletion then
added assertions about metric state during all the operations.
# cut test case from {{ITestS3ADirectoryPerformance}} as it didn't contain a
single assertion. This is not a test; it is a demo.
I really want to call out the test changes. A key benefit of adding all those
metrics is that it lets us make assertions about the no of requests made of S3,
*independent of execution time*. The tests in patch 004 don't really check
*anything*. They log stuff, but that's for humans to read, not anything which
can be automated in a jenkins build.
* the initial purpose of tests is to demonstrate that the code not only behaves
as expected, but doesn't break when you try to break it
* the secondary purpose of the test is to catch regressions: to make sure your
work holds even in the presence of changes by others (including your future
self).
* all tests need to be fully automated: the results must be verifiable in
Jenkins, rather than reviewed in the logs.
This is [the style
guide|https://github.com/steveloughran/formality/blob/master/styleguide/styleguide.md]
I try to follow when coding tests, and what I like to see in reviews. Yes, it
usually makes writing the tests harder than the actual production code. Which
is good: production code is ideally simple and reliable. Tests are are designed
to break that code and use assertions to show that they have broken it. This
takes effort —but is necessary to keep and improve code quality.
> Optimize S3AFileSystem::deleteUnnecessaryFakeDirectories
> --------------------------------------------------------
>
> Key: HADOOP-13164
> URL: https://issues.apache.org/jira/browse/HADOOP-13164
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs/s3
> Affects Versions: 2.8.0
> Reporter: Rajesh Balamohan
> Assignee: Rajesh Balamohan
> Priority: Minor
> Attachments: HADOOP-13164-branch-005.patch,
> HADOOP-13164-branch-2-003.patch, HADOOP-13164-branch-2-004.patch,
> HADOOP-13164.branch-2-002.patch, HADOOP-13164.branch-2.WIP.002.patch,
> HADOOP-13164.branch-2.WIP.patch
>
>
> https://github.com/apache/hadoop/blob/27c4e90efce04e1b1302f668b5eb22412e00d033/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L1224
> deleteUnnecessaryFakeDirectories is invoked in S3AFileSystem during rename
> and on outputstream close() to purge any fake directories. Depending on the
> nesting in the folder structure, it might take a lot longer time as it
> invokes getFileStatus multiple times. Instead, it should be able to break
> out of the loop once a non-empty directory is encountered.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]