[
https://issues.apache.org/jira/browse/HADOOP-12875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15231362#comment-15231362
]
Chris Nauroth commented on HADOOP-12875:
----------------------------------------
[~vishwajeet.dusane], thank you for providing a patch to make use of the
contract tests. I have a few comments in addition to the helpful feedback from
Tony.
How can I (and other Apache community members) obtain account credentials that
we can put into contract-test-options.xml, so that we can run the tests? I
have an Azure subscription. I checked manage.windowsazure.com, but I couldn't
find an option for provisioning Azure Data Lake access. It's going to be vital
for ongoing maintenance that community members have a way to get credentials so
that they can test patches against the live service before committing.
For configuration of the credentials, I recommend using a technique we've used
in hadoop-aws and hadoop-azure to split the credentials into a separate XML
file, which then gets XIncluded from the main XML file. We can then place the
name of the file with the credentials into .gitignore. This helps prevent
accidentally committing someone's private credentials to the Apache repo, which
would then compromise the account. Check out hadoop-aws and hadoop-azure for
more details on how to do this.
{code}
System.setProperty("hadoop.home.dir", System.getProperty("user.dir"));
{code}
Why is this necessary?
I'm unclear on the intent of the various "benchmark" tests. They use a mock
back-end, so they aren't really providing an accurate benchmark of the true
service interaction. There are no assertions, so they aren't verifying
functionality beyond making sure things don't throw exceptions. They print
timing information to the console, so is the expectation that these tests could
be used for manual measurement before and after applying later patches?
Your time measurements are using {{System#currentTimeMillis}}, which may be
subject to inaccuracy if the system clock changes or NTP makes a negative
adjustment in the middle of a test run. Instead, I recommend using
{{org.apache.hadoop.util.Time#monotonicNow}}, which is a wrapper over
{{System#nanoTime}}, which is guaranteed to be monotonic increasing.
{code}
@Override
protected AbstractFSContract createContract(Configuration configuration) {
try {
return new AdlStorageContract(configuration);
} catch (URISyntaxException e) {
return null;
} catch (IOException e) {
return null;
}
}
{code}
If any of these exceptions happens, then returning null is likely to cause a
confusing {{NullPointerException}} later. I'd prefer that we fail fast by
throwing an unchecked exception, such as {{IllegalStateException}}, with a
descriptive error message, and the original exception nested as root cause.
It's unusual to see contract test subclasses adding other test cases specific
to the file system, like {{readCombinationTest}}. The abstract contract test
classes are meant to fully define the test cases, and then the subclasses
usually just tweak the contract and skip tests that they aren't able to satisfy
yet. For clarity, I suggest refactoring those additional tests into separate
suites.
> [Azure Data Lake] Support for contract test and unit test cases
> ---------------------------------------------------------------
>
> Key: HADOOP-12875
> URL: https://issues.apache.org/jira/browse/HADOOP-12875
> Project: Hadoop Common
> Issue Type: Test
> Components: fs, fs/azure, tools
> Reporter: Vishwajeet Dusane
> Assignee: Vishwajeet Dusane
> Attachments: Hadoop-12875-001.patch
>
>
> This JIRA describes contract test and unit test cases support for azure data
> lake file system.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)