[ 
https://issues.apache.org/jira/browse/HADOOP-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HADOOP-9629:
----------------------------------

    Attachment: HADOOP-9629.trunk.2.patch

I'm re-uploading the same HADOOP-9629.trunk.2.patch file, just to trigger a 
Jenkins run.

> Support Windows Azure Storage - Blob as a file system in Hadoop
> ---------------------------------------------------------------
>
>                 Key: HADOOP-9629
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9629
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Mostafa Elhemali
>            Assignee: Mike Liddell
>         Attachments: HADOOP-9629 - Azure Filesystem - Information for 
> developers.docx, HADOOP-9629 - Azure Filesystem - Information for 
> developers.pdf, HADOOP-9629.2.patch, HADOOP-9629.3.patch, HADOOP-9629.patch, 
> HADOOP-9629.trunk.1.patch, HADOOP-9629.trunk.2.patch
>
>
> h2. Description
> This JIRA incorporates adding a new file system implementation for accessing 
> Windows Azure Storage - Blob from within Hadoop, such as using blobs as input 
> to MR jobs or configuring MR jobs to put their output directly into blob 
> storage.
> h2. High level design
> At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blob storage; the scheme wasb is used for 
> accessing it over HTTP, and wasbs for accessing over HTTPS. We use the URI 
> scheme: {code}wasb[s]://<container>@<account>/path/to/file{code} to address 
> individual blobs. We use the standard Azure Java SDK 
> (com.microsoft.windowsazure) to do most of the work. In order to map a 
> hierarchical file system over the flat name-value pair nature of blob 
> storage, we create a specially tagged blob named path/to/dir whenever we 
> create a directory called path/to/dir, then files under that are stored as 
> normal blobs path/to/dir/file. We have many metrics implemented for it using 
> the Metrics2 interface. Tests are implemented mostly using a mock 
> implementation for the Azure SDK functionality, with an option to test 
> against a real blob storage if configured (instructions provided inside in 
> README.txt).
> h2. Credits and history
> This has been ongoing work for a while, and the early version of this work 
> can be seen in HADOOP-8079. This JIRA is a significant revision of that and 
> we'll post the patch here for Hadoop trunk first, then post a patch for 
> branch-1 as well for backporting the functionality if accepted. Credit for 
> this work goes to the early team: [~minwei], [~davidlao], [~lengningliu] and 
> [~stojanovic] as well as multiple people who have taken over this work since 
> then (hope I don't forget anyone): [~dexterb], Johannes Klein, [~ivanmi], 
> Michael Rys, [~mostafae], [~brian_swan], [~mikelid], [~xifang], and 
> [~chuanliu].
> h2. Test
> Besides unit tests, we have used WASB as the default file system in our 
> service product. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. The current version reflects to the 
> version of the code tested and used in our production environment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to