[
https://issues.apache.org/jira/browse/HADOOP-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Karthik Kambatla updated HADOOP-9629:
-------------------------------------
Target Version/s: 2.6.0 (was: 3.0.0, 2.5.0)
> Support Windows Azure Storage - Blob as a file system in Hadoop
> ---------------------------------------------------------------
>
> Key: HADOOP-9629
> URL: https://issues.apache.org/jira/browse/HADOOP-9629
> Project: Hadoop Common
> Issue Type: New Feature
> Components: tools
> Reporter: Mostafa Elhemali
> Assignee: Mike Liddell
> Fix For: 3.0.0
>
> Attachments: HADOOP-9629 - Azure Filesystem - Information for
> developers.docx, HADOOP-9629 - Azure Filesystem - Information for
> developers.pdf, HADOOP-9629.2.patch, HADOOP-9629.3.patch, HADOOP-9629.patch,
> HADOOP-9629.trunk.1.patch, HADOOP-9629.trunk.2.patch,
> HADOOP-9629.trunk.3.patch, HADOOP-9629.trunk.4.patch,
> HADOOP-9629.trunk.5.patch
>
>
> h2. Description
> This JIRA incorporates adding a new file system implementation for accessing
> Windows Azure Storage - Blob from within Hadoop, such as using blobs as input
> to MR jobs or configuring MR jobs to put their output directly into blob
> storage.
> h2. High level design
> At a high level, the code here extends the FileSystem class to provide an
> implementation for accessing blob storage; the scheme wasb is used for
> accessing it over HTTP, and wasbs for accessing over HTTPS. We use the URI
> scheme: {code}wasb[s]://<container>@<account>/path/to/file{code} to address
> individual blobs. We use the standard Azure Java SDK
> (com.microsoft.windowsazure) to do most of the work. In order to map a
> hierarchical file system over the flat name-value pair nature of blob
> storage, we create a specially tagged blob named path/to/dir whenever we
> create a directory called path/to/dir, then files under that are stored as
> normal blobs path/to/dir/file. We have many metrics implemented for it using
> the Metrics2 interface. Tests are implemented mostly using a mock
> implementation for the Azure SDK functionality, with an option to test
> against a real blob storage if configured (instructions provided inside in
> README.txt).
> h2. Credits and history
> This has been ongoing work for a while, and the early version of this work
> can be seen in HADOOP-8079. This JIRA is a significant revision of that and
> we'll post the patch here for Hadoop trunk first, then post a patch for
> branch-1 as well for backporting the functionality if accepted. Credit for
> this work goes to the early team: [~minwei], [~davidlao], [~lengningliu] and
> [~stojanovic] as well as multiple people who have taken over this work since
> then (hope I don't forget anyone): [~dexterb], Johannes Klein, [~ivanmi],
> Michael Rys, [~mostafae], [~brian_swan], [~mikelid], [~xifang], and
> [~chuanliu].
> h2. Test
> Besides unit tests, we have used WASB as the default file system in our
> service product. (HDFS is also used but not as default file system.) Various
> different customer and test workloads have been run against clusters with
> such configurations for quite some time. The current version reflects to the
> version of the code tested and used in our production environment.
--
This message was sent by Atlassian JIRA
(v6.2#6252)