vishwajeet dusane created HADOOP-12666:
------------------------------------------
Summary: Support Windows Azure Data Lake - as a file system in
Hadoop
Key: HADOOP-12666
URL: https://issues.apache.org/jira/browse/HADOOP-12666
Project: Hadoop Common
Issue Type: New Feature
Components: tools
Reporter: vishwajeet dusane
h2. Description
This JIRA describes a new file system implementation for accessing Windows
Azure Data Lake Store (ADL) from within Hadoop. This would enable existing
Hadoop applications such has MR, HIVE, Hbase etc.., to use ADL store as input
or output.
ADL is ultra-high capacity, Optimized for massive throughput with rich
management and security features. More details available at
https://azure.microsoft.com/en-us/services/data-lake-store/
h2. High level design
ADL file system exposes RESTful interfaces compatible with WebHdfs
specification 2.7.1.
At a high level, the code here extends the SWebHdfsFileSystem class to provide
an implementation for accessing ADL storage; the scheme ADL is used for
accessing it over HTTPS. We use the URI scheme:
{code}adl://<URI to account>/path/to/file{code}
to address individual Files/Folders. Tests are implemented mostly using a
Contract implementation for the ADL functionality, with an option to test
against a real ADL storage if configured.
h2. Credits and history
This has been ongoing work for a while, and the early version of this work can
be seen in. Credit for this work goes to the team: [~vishwajeet.dusane],
[~snayak], [~srevanka], [~kiranch], [~chakrab], [~omkarksa], [~snvijaya],
[~ansaiprasanna] [~jsangwan]
h2. Test
Besides Contract tests, we have used ADL as the additional file system in the
current public preview release. Various different customer and test workloads
have been run against clusters with such configurations for quite some time.
The current version reflects to the version of the code tested and used in our
production environment.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)