[
https://issues.apache.org/jira/browse/HADOOP-12666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15186621#comment-15186621
]
Chris Nauroth commented on HADOOP-12666:
----------------------------------------
[~chris.douglas], thank you for the excellent summary. I agree with the
concerns about coupling with WebHDFS client internals. Asking for a WebHDFS
patch was a way to make that coupling clearer to reviewers by showing
relaxation of visibility explicitly.
Something else to keep in mind is that the audience of contributors on WebHDFS
is likely to be different from the audience of contributors on Azure Data Lake.
If a change in the WebHDFS client forces a change in an Azure Data Lake
subclass to satisfy compilation, then it's not guaranteed that the WebHDFS
contributor is going to be well-equipped to make that change correctly and test
it.
The only way I see to break this coupling is for Azure Data Lake to implement
its own client, without relying on inheritance from WebHDFS for code reuse.
There are already significant protocol deviations from WebHDFS, so an ADL
subclass is going to violate the Liskov substitution principle. (You can never
substitute an instance of {{WebHdfsFileSystem}} with an instance of
{{PrivateAzureDataLakeFileSystem}}, because that won't work correctly with the
WebHDFS back-end.)
If there is an eventual goal to move to a dedicated client codebase, then I can
understand option 1 (reliance on package-private methods) as a short-term
solution. I see your point about avoiding relaxing visibility of internals if
it's only needed short-term.
> Support Microsoft Azure Data Lake - as a file system in Hadoop
> --------------------------------------------------------------
>
> Key: HADOOP-12666
> URL: https://issues.apache.org/jira/browse/HADOOP-12666
> Project: Hadoop Common
> Issue Type: New Feature
> Components: fs, fs/azure, tools
> Reporter: Vishwajeet Dusane
> Assignee: Vishwajeet Dusane
> Attachments: HADOOP-12666-002.patch, HADOOP-12666-003.patch,
> HADOOP-12666-004.patch, HADOOP-12666-005.patch, HADOOP-12666-006.patch,
> HADOOP-12666-007.patch, HADOOP-12666-008.patch, HADOOP-12666-1.patch
>
> Original Estimate: 336h
> Time Spent: 336h
> Remaining Estimate: 0h
>
> h2. Description
> This JIRA describes a new file system implementation for accessing Microsoft
> Azure Data Lake Store (ADL) from within Hadoop. This would enable existing
> Hadoop applications such has MR, HIVE, Hbase etc.., to use ADL store as
> input or output.
>
> ADL is ultra-high capacity, Optimized for massive throughput with rich
> management and security features. More details available at
> https://azure.microsoft.com/en-us/services/data-lake-store/
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)