[
https://issues.apache.org/jira/browse/HADOOP-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chuan Liu updated HADOOP-9629:
------------------------------
Summary: Support Windows Azure Storage - Blob as a file system in Hadoop
(was: Support Azure Blob Storage as a file system in Hadoop)
> Support Windows Azure Storage - Blob as a file system in Hadoop
> ---------------------------------------------------------------
>
> Key: HADOOP-9629
> URL: https://issues.apache.org/jira/browse/HADOOP-9629
> Project: Hadoop Common
> Issue Type: Improvement
> Reporter: Mostafa Elhemali
> Assignee: Mostafa Elhemali
> Attachments: HADOOP-9629.2.patch, HADOOP-9629.patch
>
>
> h2. Description
> This JIRA incorporates adding a new file system implementation for accessing
> Windows Azure Blob storage from within Hadoop, such as using blobs as input
> to MR jobs or configuring MR jobs to put their output directly into blob
> storage.
> h2. High level design
> At a high level, the code here extends the FileSystem class to provide an
> implementation for accessing blob storage; the scheme asv is used for
> accessing it over HTTP, and asvs for accessing over HTTPS. We use the URI
> scheme: {code}asv[s]://<container>@<account>/path/to/file{code} to address
> individual blobs. We use the standard Azure Java SDK
> (com.microsoft.windowsazure) to do most of the work. In order to map a
> hierarchical file system over the flat name-value pair nature of blob
> storage, we create a specially tagged blob named path/to/dir whenever we
> create a directory called path/to/dir, then files under that are stored as
> normal blobs path/to/dir/file. We have many metrics implemented for it using
> the Metrics2 interface. Tests are implemented mostly using a mock
> implementation for the Azure SDK functionality, with an option to test
> against a real blob storage if configured (instructions provided inside in
> RunningLiveAsvTests.txt).
> h2. Credits and history
> This has been ongoing work for a while, and the early version of this work
> can be seen in HADOOP-8079. This JIRA is a significant revision of that and
> we'll post the patch here for Hadoop trunk first, then post a patch for
> branch-1 as well for backporting the functionality if accepted. Credit for
> this work goes to the early team: Min Wei, David Lao, Lengning Liu and
> Alexander Stojanovic as well as multiple people who have taken over this work
> since then (hope I don't forget anyone): Dexter Bradshaw, Johannes Klein,
> Ivan Mitic, Michael Rys and Mostafa Elhemali.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)