[ 
https://issues.apache.org/jira/browse/HADOOP-11188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14167554#comment-14167554
 ] 

Hadoop QA commented on HADOOP-11188:
------------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12674253/hadoop-11188.01.patch
  against trunk revision d3d3d47.

    {color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

    {color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

    {color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

    {color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

    {color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

    {color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

        {color:red}-1 release audit{color}.  The applied patch generated 1 
release audit warnings.

    {color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-tools/hadoop-azure.

    {color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/4901//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/4901//artifact/patchprocess/patchReleaseAuditProblems.txt
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/4901//console

This message is automatically generated.

> hadoop-azure: automatically expand page blobs when they become full
> -------------------------------------------------------------------
>
>                 Key: HADOOP-11188
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11188
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs
>            Reporter: Eric Hanson
>            Assignee: Eric Hanson
>         Attachments: hadoop-11188.01.patch
>
>
> Right now, page blobs are initialized to a fixed size 
> (fs.azure.page.blob.size) and cannot be expanded. This task is to make them 
> automatically expand when they get to be nearly full.
> Design: if a write occurs that does not have enough room in the file to 
> finish, then flush all preceding operations, extend the file, and complete 
> the write. This will be synchronized (to have exclusive access) in access to 
> PageBlobOutputStream so there won't be race conditions.
> The file will be extended by fs.azure.page.blob.extension.size bytes, which 
> must be a multiple of 512. The internal default for 
> fs.azure.page.blob.extension size will be 128 * 1024 * 1024. The minimum 
> extension size will be 4 * 1024 * 1024 which is the maximum write size, so 
> the new write will finish. 
> Extension will stop when the file size reaches 1TB. The final extension may 
> be less than fs.azure.page.blob.extension.size if the remainder (1TB - 
> current_file_size) is smaller than fs.azure.page.blob.extension.size.
> An alternative to this is to make the default size 1TB. This is much simpler 
> to implement. It's a one-line change. Or even simpler, don't change it at all 
> because it is adequate for HBase.
> Rationale for this file size extension feature:
> 1) be able to download files to local disk easily with CloudXplorer and 
> similar tools. Downloading a 1TB page blob is not practical if you don't have 
> 1TB disk space since on the local side it expands to the full file size, 
> locally filled with zeros where there is no valid data.
> 2) don't make customers uncomfortable when they see large 1TB files. They 
> often ask if they have to pay for it, even though they only pay for the space 
> actually used in the page blob.
> I think rationale 2 is a relatively minor issue, because 98% of customers for 
> HBase will never notice. They will just use it and not look at what kind of 
> files are used for the logs. They don't pay for the unused space, so it is 
> not a problem for them. We can document this. Also, if they use hadoop fs 
> -ls, they will see the actual size of the files since I put in a fix for that.
> Rationale 1 is a minor issue because you cannot interpret the data on your 
> local file system anyway due to the data format. So really, the only reason 
> to copy data locally in its binary format would be if you are moving it 
> around or archiving it. Copying a 1TB page blob from one location in the 
> cloud to another is pretty fast with smart copy utilities that don't actually 
> move the 0-filled parts of the file.
> Nevertheless, this is a convenience feature for users. They won't have to 
> worry about setting fs.azure.page.blob.size under normal circumstances and 
> can make the files grow as big as they want.
> If we make the change to extend the file size on the fly, that introduces new 
> possible error or failure modes for HBase. We should included retry logic. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to