[
https://issues.apache.org/jira/browse/HADOOP-12666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15186380#comment-15186380
]
Chris Douglas commented on HADOOP-12666:
----------------------------------------
bq. Can you create a separate JIRA for the hadoop.hdfs.web patch
(https://reviews.apache.org/r/44169/)? I think we should get that change
reviewed and then we can have a Core Updates patch that doesn't mix up the
packages.
Several (all?) reviewers have raised concerns about extending WebHDFS. The
packaging issue is a minor symptom of the fragile coupling this creates, and it
should be a goal to sever it. I'd like to avoid relaxing WebHDFS impl
visibility to support this kind of extension, since 1) it's not sufficient to
maintain WebHDFS as a public protocol and 2) we're unlikely to restrict
visibility again, when this dependency is no longer required. Of the three
approaches:
# Rely on package-private methods by adding classes to o.a.h.hdfs.web (current
approach)
# Change WebHDFS to support (some) extensions (factories, protected callbacks,
etc.)
# Copy/rename/shade the WebHDFS code this relies on
The first seems least harmful, unless (2) can be implemented cleanly and
without becoming too entangled in WebHDFS internals. [~fabbri], [~cnauroth],
[~mackrorysd], [~eddyxu], thoughts?
[~vishwajeet.dusane]: Thanks for splitting the patch; this is significantly
easier to review.
The implementation of buffered I/O is complex. Some techniques, like ping-pong
buffers, may not offer many advantages over allowing buffers to be GC'd. I
haven't looked in detail, but the singleton {{BufferManager}} is not threadsafe
and appears to be shared across all streams. Now that the patch is simpler,
could you briefly summarize how the buffering works and why it wins over
simpler, naive approaches?
> Support Microsoft Azure Data Lake - as a file system in Hadoop
> --------------------------------------------------------------
>
> Key: HADOOP-12666
> URL: https://issues.apache.org/jira/browse/HADOOP-12666
> Project: Hadoop Common
> Issue Type: New Feature
> Components: fs, fs/azure, tools
> Reporter: Vishwajeet Dusane
> Assignee: Vishwajeet Dusane
> Attachments: HADOOP-12666-002.patch, HADOOP-12666-003.patch,
> HADOOP-12666-004.patch, HADOOP-12666-005.patch, HADOOP-12666-006.patch,
> HADOOP-12666-007.patch, HADOOP-12666-008.patch, HADOOP-12666-1.patch
>
> Original Estimate: 336h
> Time Spent: 336h
> Remaining Estimate: 0h
>
> h2. Description
> This JIRA describes a new file system implementation for accessing Microsoft
> Azure Data Lake Store (ADL) from within Hadoop. This would enable existing
> Hadoop applications such has MR, HIVE, Hbase etc.., to use ADL store as
> input or output.
>
> ADL is ultra-high capacity, Optimized for massive throughput with rich
> management and security features. More details available at
> https://azure.microsoft.com/en-us/services/data-lake-store/
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)