[ 
https://issues.apache.org/jira/browse/HADOOP-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chuan Liu updated HADOOP-9629:
------------------------------

    Summary: Support Windows Azure Storage - Blob as a file system in Hadoop  
(was: Support Azure Blob Storage as a file system in Hadoop)

> Support Windows Azure Storage - Blob as a file system in Hadoop
> ---------------------------------------------------------------
>
>                 Key: HADOOP-9629
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9629
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Mostafa Elhemali
>            Assignee: Mostafa Elhemali
>         Attachments: HADOOP-9629.2.patch, HADOOP-9629.patch
>
>
> h2. Description
> This JIRA incorporates adding a new file system implementation for accessing 
> Windows Azure Blob storage from within Hadoop, such as using blobs as input 
> to MR jobs or configuring MR jobs to put their output directly into blob 
> storage.
> h2. High level design
> At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blob storage; the scheme asv is used for 
> accessing it over HTTP, and asvs for accessing over HTTPS. We use the URI 
> scheme: {code}asv[s]://<container>@<account>/path/to/file{code} to address 
> individual blobs. We use the standard Azure Java SDK 
> (com.microsoft.windowsazure) to do most of the work. In order to map a 
> hierarchical file system over the flat name-value pair nature of blob 
> storage, we create a specially tagged blob named path/to/dir whenever we 
> create a directory called path/to/dir, then files under that are stored as 
> normal blobs path/to/dir/file. We have many metrics implemented for it using 
> the Metrics2 interface. Tests are implemented mostly using a mock 
> implementation for the Azure SDK functionality, with an option to test 
> against a real blob storage if configured (instructions provided inside in 
> RunningLiveAsvTests.txt).
> h2. Credits and history
> This has been ongoing work for a while, and the early version of this work 
> can be seen in HADOOP-8079. This JIRA is a significant revision of that and 
> we'll post the patch here for Hadoop trunk first, then post a patch for 
> branch-1 as well for backporting the functionality if accepted. Credit for 
> this work goes to the early team: Min Wei, David Lao, Lengning Liu and 
> Alexander Stojanovic as well as multiple people who have taken over this work 
> since then (hope I don't forget anyone): Dexter Bradshaw, Johannes Klein, 
> Ivan Mitic, Michael Rys and Mostafa Elhemali.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to