[jira] [Commented] (HADOOP-12666) Support Microsoft Azure Data Lake - as a file system in Hadoop

Chris Douglas (JIRA) Tue, 08 Mar 2016 18:50:16 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-12666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15186380#comment-15186380
 ]


Chris Douglas commented on HADOOP-12666:
----------------------------------------

bq. Can you create a separate JIRA for the hadoop.hdfs.web patch 
(https://reviews.apache.org/r/44169/)? I think we should get that change 
reviewed and then we can have a Core Updates patch that doesn't mix up the 
packages.

Several (all?) reviewers have raised concerns about extending WebHDFS. The 
packaging issue is a minor symptom of the fragile coupling this creates, and it 
should be a goal to sever it. I'd like to avoid relaxing WebHDFS impl 
visibility to support this kind of extension, since 1) it's not sufficient to 
maintain WebHDFS as a public protocol and 2) we're unlikely to restrict 
visibility again, when this dependency is no longer required. Of the three 
approaches:
# Rely on package-private methods by adding classes to o.a.h.hdfs.web (current 
approach)
# Change WebHDFS to support (some) extensions (factories, protected callbacks, 
etc.)
# Copy/rename/shade the WebHDFS code this relies on

The first seems least harmful, unless (2) can be implemented cleanly and 
without becoming too entangled in WebHDFS internals. [~fabbri], [~cnauroth], 
[~mackrorysd], [~eddyxu], thoughts?

[~vishwajeet.dusane]: Thanks for splitting the patch; this is significantly 
easier to review.

The implementation of buffered I/O is complex. Some techniques, like ping-pong 
buffers, may not offer many advantages over allowing buffers to be GC'd. I 
haven't looked in detail, but the singleton {{BufferManager}} is not threadsafe 
and appears to be shared across all streams. Now that the patch is simpler, 
could you briefly summarize how the buffering works and why it wins over 
simpler, naive approaches?

> Support Microsoft Azure Data Lake - as a file system in Hadoop
> --------------------------------------------------------------
>
>                 Key: HADOOP-12666
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12666
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs, fs/azure, tools
>            Reporter: Vishwajeet Dusane
>            Assignee: Vishwajeet Dusane
>         Attachments: HADOOP-12666-002.patch, HADOOP-12666-003.patch, 
> HADOOP-12666-004.patch, HADOOP-12666-005.patch, HADOOP-12666-006.patch, 
> HADOOP-12666-007.patch, HADOOP-12666-008.patch, HADOOP-12666-1.patch
>
>   Original Estimate: 336h
>          Time Spent: 336h
>  Remaining Estimate: 0h
>
> h2. Description
> This JIRA describes a new file system implementation for accessing Microsoft 
> Azure Data Lake Store (ADL) from within Hadoop. This would enable existing 
> Hadoop applications such has MR, HIVE, Hbase etc..,  to use ADL store as 
> input or output.
>  
> ADL is ultra-high capacity, Optimized for massive throughput with rich 
> management and security features. More details available at 
> https://azure.microsoft.com/en-us/services/data-lake-store/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-12666) Support Microsoft Azure Data Lake - as a file system in Hadoop

Reply via email to