[jira] [Comment Edited] (HADOOP-14520) WASB: Block compaction for Azure Block Blobs

Georgi Chalakov (JIRA) Thu, 31 Aug 2017 15:08:24 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-14520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16149691#comment-16149691
 ]


Georgi Chalakov edited comment on HADOOP-14520 at 8/31/17 10:07 PM:
--------------------------------------------------------------------

Thank you for adding all these fixes. Stream capabilities looks like an useful 
feature.  

Re:flush()
FSDataOutputStream doesn't overwrite flush() and a normal flush() call on 
application level would not execute BlockBlobAppendStream::flush(). When the 
compaction is disabled hflush/hsync are nop and the performance of 
BlockBlobAppendStream for all operations is the same (or better) than before. 

Re:more than one append stream
We take a lease on the blob, that means at any point of time you can have one 
append stream only. If we had more than one append stream opened at the same 
time, we couldn't guarantee the order of write operations.

I have added hsync() call and made isclosed volatile. 

Re:close()
I think the first exception is the best indication what went wrong. After an 
exception, close() is just best effort. I don't know how useful for a client 
would be to continue after IO related exception, but if that is necessary, the 
client can continue. If block compaction is enabled, the client can go and read 
all the data until last hflush()/hsync(). When the block compaction is 
disabled, we grantee nothing. We may or may not have the data stored in the 
service.  




was (Author: georgi):
Thank you for adding all these fixes. Stream capabilities looks like an useful 
feature.  

Re:flush()
FSDataOutputStream doesn't overwrite flush() and a normal flush() call on 
application level would not execute BlockBlobAppendStream::flush(). When the 
compaction is disabled hflush/hsync are nop and the performance of 
BlockBlobAppendStream for all operations is the same (or better) than before. 

Re:more than one append stream
We take a lease on the blob, that means at any point of time you can have one 
append stream only. If we had more than one append stream, we cannot grantee 
the order of write operations.

I have added hsync() call and made isclosed volatile. 

Re:close()
I think the first exception is the best indication what went wrong. After an 
exception, close() is just best effort. I don't know how useful for a client 
would be to continue after IO related exception, but if that is necessary, the 
client can continue. If block compaction is enabled, the client can go and read 
all the data until last hflush()/hsync(). When the block compaction is 
disabled, we grantee nothing. We may or may not have the data stored in the 
service.  



> WASB: Block compaction for Azure Block Blobs
> --------------------------------------------
>
>                 Key: HADOOP-14520
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14520
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs/azure
>    Affects Versions: 3.0.0-alpha3
>            Reporter: Georgi Chalakov
>            Assignee: Georgi Chalakov
>         Attachments: HADOOP-14520-006.patch, HADOOP-14520-008.patch, 
> HADOOP-14520-009.patch, HADOOP-14520-05.patch, HADOOP_14520_07.patch, 
> HADOOP_14520_08.patch, HADOOP_14520_09.patch, HADOOP_14520_10.patch, 
> HADOOP-14520-patch-07-08.diff, HADOOP-14520-patch-07-09.diff
>
>
> Block Compaction for WASB allows uploading new blocks for every hflush/hsync 
> call. When the number of blocks is above 32000, next hflush/hsync triggers 
> the block compaction process. Block compaction replaces a sequence of blocks 
> with one block. From all the sequences with total length less than 4M, 
> compaction chooses the longest one. It is a greedy algorithm that preserve 
> all potential candidates for the next round. Block Compaction for WASB 
> increases data durability and allows using block blobs instead of page blobs. 
> By default, block compaction is disabled. Similar to the configuration for 
> page blobs, the client needs to specify HDFS folders where block compaction 
> over block blobs is enabled. 
> Results for HADOOP_14520_07.patch
> tested endpoint: fs.azure.account.key.hdfs4.blob.core.windows.net
> Tests run: 777, Failures: 0, Errors: 0, Skipped: 155



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HADOOP-14520) WASB: Block compaction for Azure Block Blobs

Reply via email to