[jira] [Commented] (HADOOP-12666) Support Microsoft Azure Data Lake - as a file system in Hadoop

Vishwajeet Dusane (JIRA) Sun, 27 Mar 2016 22:46:13 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-12666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15213834#comment-15213834
 ]


Vishwajeet Dusane commented on HADOOP-12666:
--------------------------------------------

[~twu] Thank you for the comments. 

{quote}
 Regarding the semantics document. It will be great if you can also include 
more information on how ADL backend can "lock" a file for write....
{quote}

Like we discussed and covered in MOM shared on the JIRA. Similar to WASB, Adl 
client also implements "lock" for write during createnonrecursive. FileSystem 
append and create calls does not lock a file. Lease id mechanism ensures "lock" 
over a file. Similar approach is taken by WASB and S3. Same is covered in the 
semantics document as well. Please let me know if i misread your question.


{quote}
User and group information returned as ListStatus and GetFileStatus is in form 
of GUID associated in Azure Active Directory.
{quote}

Ongoing effort at the moment to support UPN instead of GUID.

Skipping all the questions/comments on flushAsync, since i am already working 
on better append algorithm. Best would be, i will nuke the flushAsync 
implementation from this review to avoid any confusion. Would raise a separate 
JIRA when fast append feature is ready.

{quote}
Stream is closed check is missing in BatchAppendOutputStream. This check is 
present for BatchByteArrayInputStream.
{quote}
Corner case however i will update to have close check BatchAppendOutputStream 
as well.


> Support Microsoft Azure Data Lake - as a file system in Hadoop
> --------------------------------------------------------------
>
>                 Key: HADOOP-12666
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12666
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs, fs/azure, tools
>            Reporter: Vishwajeet Dusane
>            Assignee: Vishwajeet Dusane
>         Attachments: Create_Read_Hadoop_Adl_Store_Semantics.pdf, 
> HADOOP-12666-002.patch, HADOOP-12666-003.patch, HADOOP-12666-004.patch, 
> HADOOP-12666-005.patch, HADOOP-12666-006.patch, HADOOP-12666-007.patch, 
> HADOOP-12666-008.patch, HADOOP-12666-009.patch, HADOOP-12666-1.patch
>
>   Original Estimate: 336h
>          Time Spent: 336h
>  Remaining Estimate: 0h
>
> h2. Description
> This JIRA describes a new file system implementation for accessing Microsoft 
> Azure Data Lake Store (ADL) from within Hadoop. This would enable existing 
> Hadoop applications such has MR, HIVE, Hbase etc..,  to use ADL store as 
> input or output.
>  
> ADL is ultra-high capacity, Optimized for massive throughput with rich 
> management and security features. More details available at 
> https://azure.microsoft.com/en-us/services/data-lake-store/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-12666) Support Microsoft Azure Data Lake - as a file system in Hadoop

Reply via email to