[jira] [Commented] (HADOOP-13037) Azure Data Lake Client: Support Azure data lake as a file system in Hadoop

Chris Douglas (JIRA) Tue, 01 Nov 2016 23:07:20 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-13037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15627882#comment-15627882
 ]


Chris Douglas commented on HADOOP-13037:
----------------------------------------

Thanks for working through the build issues, and following up on the review 
feedback [~vishwajeet.dusane].

This may not require a branch. While the patch is large, much of it is removing 
code (to sever the WebHDFS dependency) and adding new tests. The former can be 
committed directly to trunk, and the latter we can commit in one or more 
followup issues (as subtasks of HADOOP-13257?). Any objections to this approach?

A few minor comments from skimming the latest patch:
* {{AccessTokenProviderCache}} uses an unusual singleton pattern. Does it need 
to create a private instance of itself, rather than initializing a static map 
directly?
* {{AdlPermission#equals}} and {{#hashcode}} implementations are identical to 
what they override. Instead of reading from the config to report {{getAclBit}} 
(which is expensive, and unstable) it could either store the boolean or return 
basic {{FsPermission}} instances when disabled (and conversely, 
{{AdlPermission}} can always return true when the config knob is set). That 
this is a configurable boolean is surprising, and could be explained in this 
class's (empty) javadoc.
* {{TokenProviderType::fromString}} is unused
* In initialize, {{accountFQDN}} init can move up, so the 
{{accessTokenProvider}} init can be contiguous
* Both the subclasses of {{AzureADTokenProvider}} throw exceptions for all 
abstract methods save init, and appear not to use the private instance they 
configure. The init pattern here is pretty complicated, with {{initialize}} 
creating an instance and adding it to a global cache, and not using the 
reference it retains. Can this be simplified? If the instance were returned, 
{{AccessTokenProviderCache}} additions can be moved to {{AdlFileSystem}}, so 
all the get/add calls are in one class.
* The {{TokenProviderCache}} type seems unnecessary, if {{AzureADTP}} instances 
can return null to prevent caching (instead of the instanceof/cast)
* It's a matter of taste, but importing all the static members of 
{{AdlConfKeys}} in {{AdlFileSystem}} (or none of them) is easier to read.
* The {{<P>}} tags are unnecessary in javadocs. Some are also oddly formatted.
* Some of the test javadocs still refer to {{BufferManager.java}} and related 
classes that have been removed

> Azure Data Lake Client: Support Azure data lake as a file system in Hadoop
> --------------------------------------------------------------------------
>
>                 Key: HADOOP-13037
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13037
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs, fs/azure, tools
>            Reporter: Shrikant Naidu
>            Assignee: Vishwajeet Dusane
>             Fix For: 2.9.0
>
>         Attachments: HADOOP-13037 Proposal.pdf, HADOOP-13037-001.patch, 
> HADOOP-13037-002.patch, HADOOP-13037-003.patch
>
>
> The jira proposes an improvement over HADOOP-12666 to remove webhdfs 
> dependencies from the ADL file system client and build out a standalone 
> client. At a high level, this approach would extend the Hadoop file system 
> class to provide an implementation for accessing Azure Data Lake. The scheme 
> used for accessing the file system will continue to be 
> adl://<accountname>.azuredatalake.net/path/to/file. 
> The Azure Data Lake Cloud Store will continue to provide a webHDFS rest 
> interface. The client will  access the ADLS store using WebHDFS Rest APIs 
> provided by the ADLS store. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13037) Azure Data Lake Client: Support Azure data lake as a file system in Hadoop

Reply via email to