[jira] [Updated] (HADOOP-10809) hadoop-azure: page blob support

Eric Hanson (JIRA) Wed, 08 Oct 2014 15:20:07 -0700

     [ 
https://issues.apache.org/jira/browse/HADOOP-10809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Eric Hanson updated HADOOP-10809:
---------------------------------
    Description: 
Azure Blob Storage provides two flavors: block-blobs and page-blobs.  
Block-blobs are the general purpose kind that support convenient APIs and are 
the basis for the Azure Filesystem for Hadoop (see HADOOP-9629).

Page-blobs use the same namespace as block-blobs but provide a different 
low-level feature set.  Most importantly, page-blobs can cope with an 
effectively infinite number of small accesses whereas block-blobs can only 
tolerate 50K appends before relatively manual rewriting of the data is 
necessary.  A simple analogy is that page-blobs are like a regular disk and the 
basic API is like a low-level device driver.

See http://msdn.microsoft.com/en-us/library/azure/ee691964.aspx for some 
introductory material.

The primary driving scenario for page-blob support is for HBase transaction log 
files which require an access pattern of many small writes.  Additional 
scenarios can also be supported.

Configuration:
The Hadoop Filesystem abstraction needs a mechanism so that file-create can 
determine whether to create a block- or page-blob.  To permit scenarios where 
application code doesn't know about the details of azure storage we would like 
the configuration to be Aspect-style, ie configured by the Administrator and 
transparent to the application. The current solution is to use hadoop 
configuration to declare a list of page-blob folders -- Azure Filesystem for 
Hadoop will create files in these folders using page-blob flavor.  The 
configuration key is "fs.azure.page.blob.dir", and description can be found in 
AzureNativeFileSystemStore.java.

Code changes:
- refactor of basic Azure Filesystem code to use a general BlobWrapper and 
specialized BlockBlobWrapper vs PageBlobWrapper
- introduction of PageBlob support (read, write, etc)
- miscellaneous changes such as umask handling, implementation of 
createNonRecursive(), flush/hflush/hsync.
- new unit tests.

Credit for the primary patch: Dexter Bradshaw, Mostafa Elhemali, Eric Hanson, 
Mike Liddell.

Also included in the patch is support for atomic folder rename over the Azure 
blob store through the Azure file system layer for Hadoop. See the README file 
for more details, including how to use the fs.azure.atomic.rename.dir 
configuration variable to control where atomic folder rename logic is applied. 
By default, folders under /hbase have atomic rename applied, which is needed 
for correct operation of HBase.

  was:
Azure Blob Storage provides two flavors: block-blobs and page-blobs.  
Block-blobs are the general purpose kind that support convenient APIs and are 
the basis for the Azure Filesystem for Hadoop (see HADOOP-9629).

Page-blobs use the same namespace as block-blobs but provide a different 
low-level feature set.  Most importantly, page-blobs can cope with an 
effectively infinite number of small accesses whereas block-blobs can only 
tolerate 50K appends before relatively manual rewriting of the data is 
necessary.  A simple analogy is that page-blobs are like a regular disk and the 
basic API is like a low-level device driver.

See http://msdn.microsoft.com/en-us/library/azure/ee691964.aspx for some 
introductory material.

The primary driving scenario for page-blob support is for HBase transaction log 
files which require an access pattern of many small writes.  Additional 
scenarios can also be supported.

Configuration:
The Hadoop Filesystem abstraction needs a mechanism so that file-create can 
determine whether to create a block- or page-blob.  To permit scenarios where 
application code doesn't know about the details of azure storage we would like 
the configuration to be Aspect-style, ie configured by the Administrator and 
transparent to the application. The current solution is to use hadoop 
configuration to declare a list of page-blob folders -- Azure Filesystem for 
Hadoop will create files in these folders using page-blob flavor.  The 
configuration key is "fs.azure.page.blob.dir", and description can be found in 
AzureNativeFileSystemStore.java.

Code changes:
- refactor of basic Azure Filesystem code to use a general BlobWrapper and 
specialized BlockBlobWrapper vs PageBlobWrapper
- introduction of PageBlob support (read, write, etc)
- miscellaneous changes such as umask handling, implementation of 
createNonRecursive(), flush/hflush/hsync.
- new unit tests.

Credit for the primary patch: Dexter Bradshaw, Mostafa Elhemali, Eric Hanson, 
Mike Liddell.


> hadoop-azure: page blob support
> -------------------------------
>
>                 Key: HADOOP-10809
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10809
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: tools
>            Reporter: Mike Liddell
>            Assignee: Eric Hanson
>         Attachments: HADOOP-10809.02.patch, HADOOP-10809.03.patch, 
> HADOOP-10809.04.patch, HADOOP-10809.05.patch, HADOOP-10809.06.patch, 
> HADOOP-10809.07.patch, HADOOP-10809.08.patch, HADOOP-10809.09.patch, 
> HADOOP-10809.1.patch, HADOOP-10809.10.patch, HADOOP-10809.11.patch
>
>
> Azure Blob Storage provides two flavors: block-blobs and page-blobs.  
> Block-blobs are the general purpose kind that support convenient APIs and are 
> the basis for the Azure Filesystem for Hadoop (see HADOOP-9629).
> Page-blobs use the same namespace as block-blobs but provide a different 
> low-level feature set.  Most importantly, page-blobs can cope with an 
> effectively infinite number of small accesses whereas block-blobs can only 
> tolerate 50K appends before relatively manual rewriting of the data is 
> necessary.  A simple analogy is that page-blobs are like a regular disk and 
> the basic API is like a low-level device driver.
> See http://msdn.microsoft.com/en-us/library/azure/ee691964.aspx for some 
> introductory material.
> The primary driving scenario for page-blob support is for HBase transaction 
> log files which require an access pattern of many small writes.  Additional 
> scenarios can also be supported.
> Configuration:
> The Hadoop Filesystem abstraction needs a mechanism so that file-create can 
> determine whether to create a block- or page-blob.  To permit scenarios where 
> application code doesn't know about the details of azure storage we would 
> like the configuration to be Aspect-style, ie configured by the Administrator 
> and transparent to the application. The current solution is to use hadoop 
> configuration to declare a list of page-blob folders -- Azure Filesystem for 
> Hadoop will create files in these folders using page-blob flavor.  The 
> configuration key is "fs.azure.page.blob.dir", and description can be found 
> in AzureNativeFileSystemStore.java.
> Code changes:
> - refactor of basic Azure Filesystem code to use a general BlobWrapper and 
> specialized BlockBlobWrapper vs PageBlobWrapper
> - introduction of PageBlob support (read, write, etc)
> - miscellaneous changes such as umask handling, implementation of 
> createNonRecursive(), flush/hflush/hsync.
> - new unit tests.
> Credit for the primary patch: Dexter Bradshaw, Mostafa Elhemali, Eric Hanson, 
> Mike Liddell.
> Also included in the patch is support for atomic folder rename over the Azure 
> blob store through the Azure file system layer for Hadoop. See the README 
> file for more details, including how to use the fs.azure.atomic.rename.dir 
> configuration variable to control where atomic folder rename logic is 
> applied. By default, folders under /hbase have atomic rename applied, which 
> is needed for correct operation of HBase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-10809) hadoop-azure: page blob support

Reply via email to