[
https://issues.apache.org/jira/browse/HADOOP-12666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15189518#comment-15189518
]
Vishwajeet Dusane commented on HADOOP-12666:
--------------------------------------------
*Notes From Mar 9, 2016 Call w/ MSFT*
Who: Cloudera: Aaron Fabbri, Tony Wu, MSFT: Vishwajeet, Cathy, Chris Douglas,
Shrikant
*Discussion*
1. Packaging / Code Structure
- In general, ADL extension of WebHDFS would not be acceptable as long term
solution
- Webhdfs client not designed for extension.
- [Available options as of
today|https://issues.apache.org/jira/browse/HADOOP-12666?focusedCommentId=15186380&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15186380]
- Option 1 vs 2 (refactor WebHDFS) vs 3 (copy paste code, bad)
- Option 2 (MSFT): Need to make change to WebHDFS to accept ADL stuff. May be
significant work.
- Raise a separate JIRA for WebHDFS extension
2. WebHDFS and ADL cannot co-exist problem if both follows OAuth2
authentication protocol
- Near term: specify limitation of only one webhdfs client at a time w/ OAUTH.
Ok to have Webhdfs non-oauth and ADL configured on same cluster. - AP:
Vishwajeet to document as known limitation
- Long term: v2 of adl connector that factors out webhdfs client commonality
better
3. Integrity / Semantics
- Single writer semantics?
- See leaseId in PrivateAzureDataLakeFileSystem::createNonRecursive()
- Append semantics does not close connection hence the leaseId is not required.
4. Action Items
- [msft] Put webhdfs extension issue into a separate JIRA so folks from the
community can comment. Do they prefer hadoop-azure-datalake mixes packages, or
relaxing some method privacy, or suggest other approach? - Raised HDFS-9938
- [msft] volatile not needed in addition to synchronized in
BatchByteArrayInputStream - AP: Vishwajeet
- [msft] Add to documentation: caveat for v1 where you can only have one
WebHDFS (ADL or vanilla) with Oauth2 not both. - AP: Vishwajeet
- [cloudera] Go over latest patches.
- [cloudera] Reach out to other hadoop committers to see what else needs
addressing before we can get committed.
- [msft/cloudera] Start document on adl:// semantics, deltas versus HDFS, w/
and w/o FileStatusCache
5. Follow Up Topics (homework / next meeting)
- Follow up on append(). No leaseid. What is delta from HDFS semantics.
- BufferManager purpose, coherency
- For readahead, so multiple FSInputStreams can see the same buffer that was
fetched with readahead.
- Follow up on flushAsync() in write path (why / how)
6. Future plan of ADL client implementation
- Share with community about future plans
- Versioning
> Support Microsoft Azure Data Lake - as a file system in Hadoop
> --------------------------------------------------------------
>
> Key: HADOOP-12666
> URL: https://issues.apache.org/jira/browse/HADOOP-12666
> Project: Hadoop Common
> Issue Type: New Feature
> Components: fs, fs/azure, tools
> Reporter: Vishwajeet Dusane
> Assignee: Vishwajeet Dusane
> Attachments: HADOOP-12666-002.patch, HADOOP-12666-003.patch,
> HADOOP-12666-004.patch, HADOOP-12666-005.patch, HADOOP-12666-006.patch,
> HADOOP-12666-007.patch, HADOOP-12666-008.patch, HADOOP-12666-1.patch
>
> Original Estimate: 336h
> Time Spent: 336h
> Remaining Estimate: 0h
>
> h2. Description
> This JIRA describes a new file system implementation for accessing Microsoft
> Azure Data Lake Store (ADL) from within Hadoop. This would enable existing
> Hadoop applications such has MR, HIVE, Hbase etc.., to use ADL store as
> input or output.
>
> ADL is ultra-high capacity, Optimized for massive throughput with rich
> management and security features. More details available at
> https://azure.microsoft.com/en-us/services/data-lake-store/
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)