[
https://issues.apache.org/jira/browse/HADOOP-12666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15213834#comment-15213834
]
Vishwajeet Dusane commented on HADOOP-12666:
--------------------------------------------
[~twu] Thank you for the comments.
{quote}
Regarding the semantics document. It will be great if you can also include
more information on how ADL backend can "lock" a file for write....
{quote}
Like we discussed and covered in MOM shared on the JIRA. Similar to WASB, Adl
client also implements "lock" for write during createnonrecursive. FileSystem
append and create calls does not lock a file. Lease id mechanism ensures "lock"
over a file. Similar approach is taken by WASB and S3. Same is covered in the
semantics document as well. Please let me know if i misread your question.
{quote}
User and group information returned as ListStatus and GetFileStatus is in form
of GUID associated in Azure Active Directory.
{quote}
Ongoing effort at the moment to support UPN instead of GUID.
Skipping all the questions/comments on flushAsync, since i am already working
on better append algorithm. Best would be, i will nuke the flushAsync
implementation from this review to avoid any confusion. Would raise a separate
JIRA when fast append feature is ready.
{quote}
Stream is closed check is missing in BatchAppendOutputStream. This check is
present for BatchByteArrayInputStream.
{quote}
Corner case however i will update to have close check BatchAppendOutputStream
as well.
> Support Microsoft Azure Data Lake - as a file system in Hadoop
> --------------------------------------------------------------
>
> Key: HADOOP-12666
> URL: https://issues.apache.org/jira/browse/HADOOP-12666
> Project: Hadoop Common
> Issue Type: New Feature
> Components: fs, fs/azure, tools
> Reporter: Vishwajeet Dusane
> Assignee: Vishwajeet Dusane
> Attachments: Create_Read_Hadoop_Adl_Store_Semantics.pdf,
> HADOOP-12666-002.patch, HADOOP-12666-003.patch, HADOOP-12666-004.patch,
> HADOOP-12666-005.patch, HADOOP-12666-006.patch, HADOOP-12666-007.patch,
> HADOOP-12666-008.patch, HADOOP-12666-009.patch, HADOOP-12666-1.patch
>
> Original Estimate: 336h
> Time Spent: 336h
> Remaining Estimate: 0h
>
> h2. Description
> This JIRA describes a new file system implementation for accessing Microsoft
> Azure Data Lake Store (ADL) from within Hadoop. This would enable existing
> Hadoop applications such has MR, HIVE, Hbase etc.., to use ADL store as
> input or output.
>
> ADL is ultra-high capacity, Optimized for massive throughput with rich
> management and security features. More details available at
> https://azure.microsoft.com/en-us/services/data-lake-store/
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)