[ 
https://issues.apache.org/jira/browse/HADOOP-14520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16133912#comment-16133912
 ] 

Thomas Marquardt commented on HADOOP-14520:
-------------------------------------------

I will hand this off to Georgi, as he is returning from vacation Monday.  I 
noticed the following while reviewing the latest patches:

1) {{writeBlockRequestInternal}} has retry logic that returns the buffer to the 
pool and then retries using the buffer that it just returned.

2) {{writeBlockRequestInternal}} is currently returning a byte array originally 
created by {{ByteArrayOutputStream}} to the buffer pool.  If this is not clear, 
look at blockCompaction where it creates {{ByteArrayOutputStreamInternal}}, 
then wraps the underlying {{byte[]}} in a {{ByteBuffer}} and passes it to 
{{writeBlockRequestInternal}} which returns it to the pool.

3) {{blockCompaction}} can be refactored to make unit testing easy.  For 
example, extracting out a {{getBlockSequenceForCompaction}} function that takes 
a block list as input and returns a sequence of blocks to be compacted would 
allow a data driven unit test to run many different block lists thru the 
algorithm.

4) I recommend the following description for the blockCompaction function:


{code:java}
/**
 * Block compaction is only enabled when the number of blocks exceeds 
activateCompactionBlockCount.
 * The algorithm searches for the longest sequence of two or more blocks {b1, 
b2, ..., bn} such that
 * size(b1) + size(b2) + ... + size(bn) < maximum-block-size.  It then 
downloads the blocks in the
 * sequence, concatenates the data to form a single block, uploads this new 
block, and updates the block
 * list to replace the sequence of blocks with the new block.
 */
{code}

5) I recommend renaming {{BlockBlobAppendStream.bufferSize}} to 
{{maxBlockSize}}.  It is the maximum size of a block.


> WASB: Block compaction for Azure Block Blobs
> --------------------------------------------
>
>                 Key: HADOOP-14520
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14520
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs/azure
>    Affects Versions: 3.0.0-alpha3
>            Reporter: Georgi Chalakov
>            Assignee: Georgi Chalakov
>         Attachments: HADOOP-14520-006.patch, HADOOP-14520-05.patch
>
>
> Block Compaction for WASB allows uploading new blocks for every hflush/hsync 
> call. When the number of blocks is above 32000, next hflush/hsync triggers 
> the block compaction process. Block compaction replaces a sequence of blocks 
> with one block. From all the sequences with total length less than 4M, 
> compaction chooses the longest one. It is a greedy algorithm that preserve 
> all potential candidates for the next round. Block Compaction for WASB 
> increases data durability and allows using block blobs instead of page blobs. 
> By default, block compaction is disabled. Similar to the configuration for 
> page blobs, the client needs to specify HDFS folders where block compaction 
> over block blobs is enabled. 
> Results for HADOOP-14520-05.patch
> tested endpoint: fs.azure.account.key.hdfs4.blob.core.windows.net
> Tests run: 707, Failures: 0, Errors: 0, Skipped: 119



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to