[ https://issues.apache.org/jira/browse/HADOOP-13037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15627882#comment-15627882 ]
Chris Douglas commented on HADOOP-13037: ---------------------------------------- Thanks for working through the build issues, and following up on the review feedback [~vishwajeet.dusane]. This may not require a branch. While the patch is large, much of it is removing code (to sever the WebHDFS dependency) and adding new tests. The former can be committed directly to trunk, and the latter we can commit in one or more followup issues (as subtasks of HADOOP-13257?). Any objections to this approach? A few minor comments from skimming the latest patch: * {{AccessTokenProviderCache}} uses an unusual singleton pattern. Does it need to create a private instance of itself, rather than initializing a static map directly? * {{AdlPermission#equals}} and {{#hashcode}} implementations are identical to what they override. Instead of reading from the config to report {{getAclBit}} (which is expensive, and unstable) it could either store the boolean or return basic {{FsPermission}} instances when disabled (and conversely, {{AdlPermission}} can always return true when the config knob is set). That this is a configurable boolean is surprising, and could be explained in this class's (empty) javadoc. * {{TokenProviderType::fromString}} is unused * In initialize, {{accountFQDN}} init can move up, so the {{accessTokenProvider}} init can be contiguous * Both the subclasses of {{AzureADTokenProvider}} throw exceptions for all abstract methods save init, and appear not to use the private instance they configure. The init pattern here is pretty complicated, with {{initialize}} creating an instance and adding it to a global cache, and not using the reference it retains. Can this be simplified? If the instance were returned, {{AccessTokenProviderCache}} additions can be moved to {{AdlFileSystem}}, so all the get/add calls are in one class. * The {{TokenProviderCache}} type seems unnecessary, if {{AzureADTP}} instances can return null to prevent caching (instead of the instanceof/cast) * It's a matter of taste, but importing all the static members of {{AdlConfKeys}} in {{AdlFileSystem}} (or none of them) is easier to read. * The {{<P>}} tags are unnecessary in javadocs. Some are also oddly formatted. * Some of the test javadocs still refer to {{BufferManager.java}} and related classes that have been removed > Azure Data Lake Client: Support Azure data lake as a file system in Hadoop > -------------------------------------------------------------------------- > > Key: HADOOP-13037 > URL: https://issues.apache.org/jira/browse/HADOOP-13037 > Project: Hadoop Common > Issue Type: Improvement > Components: fs, fs/azure, tools > Reporter: Shrikant Naidu > Assignee: Vishwajeet Dusane > Fix For: 2.9.0 > > Attachments: HADOOP-13037 Proposal.pdf, HADOOP-13037-001.patch, > HADOOP-13037-002.patch, HADOOP-13037-003.patch > > > The jira proposes an improvement over HADOOP-12666 to remove webhdfs > dependencies from the ADL file system client and build out a standalone > client. At a high level, this approach would extend the Hadoop file system > class to provide an implementation for accessing Azure Data Lake. The scheme > used for accessing the file system will continue to be > adl://<accountname>.azuredatalake.net/path/to/file. > The Azure Data Lake Cloud Store will continue to provide a webHDFS rest > interface. The client will access the ADLS store using WebHDFS Rest APIs > provided by the ADLS store. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org