[
https://issues.apache.org/jira/browse/HADOOP-11188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14167554#comment-14167554
]
Hadoop QA commented on HADOOP-11188:
------------------------------------
{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12674253/hadoop-11188.01.patch
against trunk revision d3d3d47.
{color:green}+1 @author{color}. The patch does not contain any @author
tags.
{color:green}+1 tests included{color}. The patch appears to include 1 new
or modified test files.
{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.
{color:green}+1 javadoc{color}. There were no new javadoc warning messages.
{color:green}+1 eclipse:eclipse{color}. The patch built with
eclipse:eclipse.
{color:green}+1 findbugs{color}. The patch does not introduce any new
Findbugs (version 2.0.3) warnings.
{color:red}-1 release audit{color}. The applied patch generated 1
release audit warnings.
{color:green}+1 core tests{color}. The patch passed unit tests in
hadoop-tools/hadoop-azure.
{color:green}+1 contrib tests{color}. The patch passed contrib unit tests.
Test results:
https://builds.apache.org/job/PreCommit-HADOOP-Build/4901//testReport/
Release audit warnings:
https://builds.apache.org/job/PreCommit-HADOOP-Build/4901//artifact/patchprocess/patchReleaseAuditProblems.txt
Console output:
https://builds.apache.org/job/PreCommit-HADOOP-Build/4901//console
This message is automatically generated.
> hadoop-azure: automatically expand page blobs when they become full
> -------------------------------------------------------------------
>
> Key: HADOOP-11188
> URL: https://issues.apache.org/jira/browse/HADOOP-11188
> Project: Hadoop Common
> Issue Type: Improvement
> Components: fs
> Reporter: Eric Hanson
> Assignee: Eric Hanson
> Attachments: hadoop-11188.01.patch
>
>
> Right now, page blobs are initialized to a fixed size
> (fs.azure.page.blob.size) and cannot be expanded. This task is to make them
> automatically expand when they get to be nearly full.
> Design: if a write occurs that does not have enough room in the file to
> finish, then flush all preceding operations, extend the file, and complete
> the write. This will be synchronized (to have exclusive access) in access to
> PageBlobOutputStream so there won't be race conditions.
> The file will be extended by fs.azure.page.blob.extension.size bytes, which
> must be a multiple of 512. The internal default for
> fs.azure.page.blob.extension size will be 128 * 1024 * 1024. The minimum
> extension size will be 4 * 1024 * 1024 which is the maximum write size, so
> the new write will finish.
> Extension will stop when the file size reaches 1TB. The final extension may
> be less than fs.azure.page.blob.extension.size if the remainder (1TB -
> current_file_size) is smaller than fs.azure.page.blob.extension.size.
> An alternative to this is to make the default size 1TB. This is much simpler
> to implement. It's a one-line change. Or even simpler, don't change it at all
> because it is adequate for HBase.
> Rationale for this file size extension feature:
> 1) be able to download files to local disk easily with CloudXplorer and
> similar tools. Downloading a 1TB page blob is not practical if you don't have
> 1TB disk space since on the local side it expands to the full file size,
> locally filled with zeros where there is no valid data.
> 2) don't make customers uncomfortable when they see large 1TB files. They
> often ask if they have to pay for it, even though they only pay for the space
> actually used in the page blob.
> I think rationale 2 is a relatively minor issue, because 98% of customers for
> HBase will never notice. They will just use it and not look at what kind of
> files are used for the logs. They don't pay for the unused space, so it is
> not a problem for them. We can document this. Also, if they use hadoop fs
> -ls, they will see the actual size of the files since I put in a fix for that.
> Rationale 1 is a minor issue because you cannot interpret the data on your
> local file system anyway due to the data format. So really, the only reason
> to copy data locally in its binary format would be if you are moving it
> around or archiving it. Copying a 1TB page blob from one location in the
> cloud to another is pretty fast with smart copy utilities that don't actually
> move the 0-filled parts of the file.
> Nevertheless, this is a convenience feature for users. They won't have to
> worry about setting fs.azure.page.blob.size under normal circumstances and
> can make the files grow as big as they want.
> If we make the change to extend the file size on the fly, that introduces new
> possible error or failure modes for HBase. We should included retry logic.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)