[jira] [Commented] (HADOOP-12666) Support Microsoft Azure Data Lake - as a file system in Hadoop

Vishwajeet Dusane (JIRA) Thu, 10 Mar 2016 08:44:11 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-12666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15189518#comment-15189518
 ]


Vishwajeet Dusane commented on HADOOP-12666:
--------------------------------------------

*Notes From Mar 9, 2016 Call w/ MSFT*
Who: Cloudera: Aaron Fabbri, Tony Wu, MSFT: Vishwajeet, Cathy, Chris Douglas, 
Shrikant
*Discussion*
1. Packaging / Code Structure
 - In general, ADL extension of WebHDFS would not be acceptable as long term 
solution
 - Webhdfs client not designed for extension.
 - [Available options as of 
today|https://issues.apache.org/jira/browse/HADOOP-12666?focusedCommentId=15186380&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15186380]
 - Option 1 vs 2 (refactor WebHDFS) vs 3 (copy paste code, bad) 
 - Option 2 (MSFT): Need to make change to WebHDFS to accept ADL stuff. May be 
significant work.
 - Raise a separate JIRA for WebHDFS extension

2. WebHDFS and ADL cannot co-exist problem if both follows OAuth2 
authentication protocol
 - Near term: specify limitation of only one webhdfs client at a time w/ OAUTH. 
 Ok to have Webhdfs non-oauth and ADL configured on same cluster. - AP: 
Vishwajeet to document as known limitation
 - Long term: v2 of adl connector that factors out webhdfs client commonality 
better

3. Integrity / Semantics
 - Single writer semantics?
 - See leaseId in PrivateAzureDataLakeFileSystem::createNonRecursive()
 - Append semantics does not close connection hence the leaseId is not required.

4. Action Items
 - [msft] Put webhdfs extension issue into a separate JIRA so folks from the 
community can comment.  Do they prefer hadoop-azure-datalake mixes packages, or 
relaxing some method privacy, or suggest other approach? - Raised HDFS-9938
 - [msft] volatile not needed in addition to synchronized in 
BatchByteArrayInputStream - AP: Vishwajeet
 - [msft] Add to documentation: caveat for v1 where you can only have one 
WebHDFS (ADL or vanilla) with Oauth2 not both. - AP: Vishwajeet
 - [cloudera] Go over latest patches.
 - [cloudera] Reach out to other hadoop committers to see what else needs 
addressing before we can get committed.
 - [msft/cloudera] Start document on adl:// semantics, deltas versus HDFS, w/ 
and w/o FileStatusCache

5. Follow Up Topics (homework / next meeting)
- Follow up on append().  No leaseid.  What is delta from HDFS semantics.
- BufferManager purpose, coherency
- For readahead, so multiple FSInputStreams can see the same buffer that was 
fetched with readahead.
- Follow up on flushAsync() in write path (why / how)

6. Future plan of ADL client implementation
 - Share with community about future plans
 - Versioning


> Support Microsoft Azure Data Lake - as a file system in Hadoop
> --------------------------------------------------------------
>
>                 Key: HADOOP-12666
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12666
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs, fs/azure, tools
>            Reporter: Vishwajeet Dusane
>            Assignee: Vishwajeet Dusane
>         Attachments: HADOOP-12666-002.patch, HADOOP-12666-003.patch, 
> HADOOP-12666-004.patch, HADOOP-12666-005.patch, HADOOP-12666-006.patch, 
> HADOOP-12666-007.patch, HADOOP-12666-008.patch, HADOOP-12666-1.patch
>
>   Original Estimate: 336h
>          Time Spent: 336h
>  Remaining Estimate: 0h
>
> h2. Description
> This JIRA describes a new file system implementation for accessing Microsoft 
> Azure Data Lake Store (ADL) from within Hadoop. This would enable existing 
> Hadoop applications such has MR, HIVE, Hbase etc..,  to use ADL store as 
> input or output.
>  
> ADL is ultra-high capacity, Optimized for massive throughput with rich 
> management and security features. More details available at 
> https://azure.microsoft.com/en-us/services/data-lake-store/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-12666) Support Microsoft Azure Data Lake - as a file system in Hadoop

Reply via email to