[ 
https://issues.apache.org/jira/browse/HADOOP-12875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15231362#comment-15231362
 ] 

Chris Nauroth commented on HADOOP-12875:
----------------------------------------

[~vishwajeet.dusane], thank you for providing a patch to make use of the 
contract tests.  I have a few comments in addition to the helpful feedback from 
Tony.

How can I (and other Apache community members) obtain account credentials that 
we can put into contract-test-options.xml, so that we can run the tests?  I 
have an Azure subscription.  I checked manage.windowsazure.com, but I couldn't 
find an option for provisioning Azure Data Lake access.  It's going to be vital 
for ongoing maintenance that community members have a way to get credentials so 
that they can test patches against the live service before committing.

For configuration of the credentials, I recommend using a technique we've used 
in hadoop-aws and hadoop-azure to split the credentials into a separate XML 
file, which then gets XIncluded from the main XML file.  We can then place the 
name of the file with the credentials into .gitignore.  This helps prevent 
accidentally committing someone's private credentials to the Apache repo, which 
would then compromise the account.  Check out hadoop-aws and hadoop-azure for 
more details on how to do this.

{code}
    System.setProperty("hadoop.home.dir", System.getProperty("user.dir"));
{code}

Why is this necessary?

I'm unclear on the intent of the various "benchmark" tests.  They use a mock 
back-end, so they aren't really providing an accurate benchmark of the true 
service interaction.  There are no assertions, so they aren't verifying 
functionality beyond making sure things don't throw exceptions.  They print 
timing information to the console, so is the expectation that these tests could 
be used for manual measurement before and after applying later patches?

Your time measurements are using {{System#currentTimeMillis}}, which may be 
subject to inaccuracy if the system clock changes or NTP makes a negative 
adjustment in the middle of a test run.  Instead, I recommend using 
{{org.apache.hadoop.util.Time#monotonicNow}}, which is a wrapper over 
{{System#nanoTime}}, which is guaranteed to be monotonic increasing.

{code}
  @Override
  protected AbstractFSContract createContract(Configuration configuration) {
    try {
      return new AdlStorageContract(configuration);
    } catch (URISyntaxException e) {
      return null;
    } catch (IOException e) {
      return null;
    }
  }
{code}

If any of these exceptions happens, then returning null is likely to cause a 
confusing {{NullPointerException}} later.  I'd prefer that we fail fast by 
throwing an unchecked exception, such as {{IllegalStateException}}, with a 
descriptive error message, and the original exception nested as root cause.

It's unusual to see contract test subclasses adding other test cases specific 
to the file system, like {{readCombinationTest}}.  The abstract contract test 
classes are meant to fully define the test cases, and then the subclasses 
usually just tweak the contract and skip tests that they aren't able to satisfy 
yet.  For clarity, I suggest refactoring those additional tests into separate 
suites.


> [Azure Data Lake] Support for contract test and unit test cases
> ---------------------------------------------------------------
>
>                 Key: HADOOP-12875
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12875
>             Project: Hadoop Common
>          Issue Type: Test
>          Components: fs, fs/azure, tools
>            Reporter: Vishwajeet Dusane
>            Assignee: Vishwajeet Dusane
>         Attachments: Hadoop-12875-001.patch
>
>
> This JIRA describes contract test and unit test cases support for azure data 
> lake file system.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to