[jira] [Commented] (HADOOP-12666) Support Microsoft Azure Data Lake - as a file system in Hadoop

Chris Nauroth (JIRA) Tue, 08 Mar 2016 22:41:09 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-12666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15186621#comment-15186621
 ]


Chris Nauroth commented on HADOOP-12666:
----------------------------------------

[~chris.douglas], thank you for the excellent summary.  I agree with the 
concerns about coupling with WebHDFS client internals.  Asking for a WebHDFS 
patch was a way to make that coupling clearer to reviewers by showing 
relaxation of visibility explicitly.

Something else to keep in mind is that the audience of contributors on WebHDFS 
is likely to be different from the audience of contributors on Azure Data Lake. 
 If a change in the WebHDFS client forces a change in an Azure Data Lake 
subclass to satisfy compilation, then it's not guaranteed that the WebHDFS 
contributor is going to be well-equipped to make that change correctly and test 
it.

The only way I see to break this coupling is for Azure Data Lake to implement 
its own client, without relying on inheritance from WebHDFS for code reuse.  
There are already significant protocol deviations from WebHDFS, so an ADL 
subclass is going to violate the Liskov substitution principle.  (You can never 
substitute an instance of {{WebHdfsFileSystem}} with an instance of 
{{PrivateAzureDataLakeFileSystem}}, because that won't work correctly with the 
WebHDFS back-end.)

If there is an eventual goal to move to a dedicated client codebase, then I can 
understand option 1 (reliance on package-private methods) as a short-term 
solution.  I see your point about avoiding relaxing visibility of internals if 
it's only needed short-term.

> Support Microsoft Azure Data Lake - as a file system in Hadoop
> --------------------------------------------------------------
>
>                 Key: HADOOP-12666
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12666
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs, fs/azure, tools
>            Reporter: Vishwajeet Dusane
>            Assignee: Vishwajeet Dusane
>         Attachments: HADOOP-12666-002.patch, HADOOP-12666-003.patch, 
> HADOOP-12666-004.patch, HADOOP-12666-005.patch, HADOOP-12666-006.patch, 
> HADOOP-12666-007.patch, HADOOP-12666-008.patch, HADOOP-12666-1.patch
>
>   Original Estimate: 336h
>          Time Spent: 336h
>  Remaining Estimate: 0h
>
> h2. Description
> This JIRA describes a new file system implementation for accessing Microsoft 
> Azure Data Lake Store (ADL) from within Hadoop. This would enable existing 
> Hadoop applications such has MR, HIVE, Hbase etc..,  to use ADL store as 
> input or output.
>  
> ADL is ultra-high capacity, Optimized for massive throughput with rich 
> management and security features. More details available at 
> https://azure.microsoft.com/en-us/services/data-lake-store/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-12666) Support Microsoft Azure Data Lake - as a file system in Hadoop

Reply via email to