[jira] [Updated] (HADOOP-13164) Optimize S3AFileSystem::deleteUnnecessaryFakeDirectories

Steve Loughran (JIRA) Wed, 28 Sep 2016 04:38:47 -0700

     [ 
https://issues.apache.org/jira/browse/HADOOP-13164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Steve Loughran updated HADOOP-13164:
------------------------------------
    Attachment: HADOOP-13164-branch-005.patch

# using {{ArrayList<>}} for list; no need to go near guava if we can help it.
# tuned loop to not be a while true + break on root, but just be a 
{{while(!isRootPath(f))}}
# moved check for an empty list into {{removeKeys()}}, so it is everywhere.
# cut test out of {{ITestS3AContractRename}}. This test is *only* for testing 
compliance with the FS spec, and were it not for consistency problems 
surfacing, should have no test cases at all, just a materialized subclass of 
{{AbstractContractRenameTest}}
# moved this test in {{ITestS3AFileOperationCost.testFakeDirectoryDeletion then 
added assertions about metric state during all the operations.
# cut test case from {{ITestS3ADirectoryPerformance}} as it didn't contain a 
single assertion. This is not a test; it is a demo.

I really want to call out the test changes. A key benefit of adding all those 
metrics is that it lets us make assertions about the no of requests made of S3, 
*independent of execution time*. The tests in patch 004 don't really check 
*anything*. They log stuff, but that's for humans to read, not anything which 
can be automated in a jenkins build.

* the initial purpose of tests is to demonstrate that the code not only behaves 
as expected, but doesn't break when you try to break it
* the secondary purpose of the test is to catch regressions: to make sure your 
work holds even in the presence of changes by others (including your future 
self).
* all tests need to be fully automated: the results must be verifiable in 
Jenkins, rather than reviewed in the logs.

This is [the style 
guide|https://github.com/steveloughran/formality/blob/master/styleguide/styleguide.md]
 I try to follow when coding tests, and what I like to see in reviews. Yes, it 
usually makes writing the tests harder than the actual production code. Which 
is good: production code is ideally simple and reliable. Tests are are designed 
to break that code and use assertions to show that they have broken it. This 
takes effort —but is necessary to keep and improve code quality.



> Optimize S3AFileSystem::deleteUnnecessaryFakeDirectories
> --------------------------------------------------------
>
>                 Key: HADOOP-13164
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13164
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/s3
>    Affects Versions: 2.8.0
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>            Priority: Minor
>         Attachments: HADOOP-13164-branch-005.patch, 
> HADOOP-13164-branch-2-003.patch, HADOOP-13164-branch-2-004.patch, 
> HADOOP-13164.branch-2-002.patch, HADOOP-13164.branch-2.WIP.002.patch, 
> HADOOP-13164.branch-2.WIP.patch
>
>
> https://github.com/apache/hadoop/blob/27c4e90efce04e1b1302f668b5eb22412e00d033/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L1224
> deleteUnnecessaryFakeDirectories is invoked in S3AFileSystem during rename 
> and on outputstream close() to purge any fake directories. Depending on the 
> nesting in the folder structure, it might take a lot longer time as it 
> invokes getFileStatus multiple times.  Instead, it should be able to break 
> out of the loop once a non-empty directory is encountered. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HADOOP-13164) Optimize S3AFileSystem::deleteUnnecessaryFakeDirectories

Reply via email to