[jira] [Updated] (HADOOP-14520) WASB: Block compaction for Azure Block Blobs

Steve Loughran (JIRA) Thu, 31 Aug 2017 13:57:15 -0700

     [ 
https://issues.apache.org/jira/browse/HADOOP-14520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Steve Loughran updated HADOOP-14520:
------------------------------------
    Attachment: HADOOP-14520-009.patch

Patch 009

* fixed checkstyle complaints in production code
* fixed checkstyle complaints in test src where appropriate
* noticed that {{NativeAzureFsOutputStream.close()}} didn't guarantee that 
out==null at the end. fixed
* tweaked imports in {{BlockBlobAppendStream}} so that new com.microsoft 
imports are alongside existing ones
* IDE warning of some codepaths in the test cases may not set the variables 
tested. Unlikely, but I added the asserts
to fail meaningfully on this.

Tested: {{TestNativeAzureFileSystemBlockCompaction}} against mock FS. 

I note we don't have any functional tests... 
{{TestNativeAzureFileSystemAppend}} could do this, either as is
or with a subclass which tweaks the config before running the test; it'd need 
some hsync() calls in the various operations too, just to help force that 
codepath to work.

Actually, that raises an interesting question. What if there is >1 append 
stream open on same/different host, and they both decide to do a compaction. 
The API lets them do this, right? 

Trying to acquire a lease for a compaction & failing shouldn't propagate up, it 
could just be swallowed & tried later.


> WASB: Block compaction for Azure Block Blobs
> --------------------------------------------
>
>                 Key: HADOOP-14520
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14520
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs/azure
>    Affects Versions: 3.0.0-alpha3
>            Reporter: Georgi Chalakov
>            Assignee: Georgi Chalakov
>         Attachments: HADOOP-14520-006.patch, HADOOP-14520-008.patch, 
> HADOOP-14520-009.patch, HADOOP-14520-05.patch, HADOOP_14520_07.patch, 
> HADOOP_14520_08.patch, HADOOP_14520_09.patch, HADOOP-14520-patch-07-08.diff, 
> HADOOP-14520-patch-07-09.diff
>
>
> Block Compaction for WASB allows uploading new blocks for every hflush/hsync 
> call. When the number of blocks is above 32000, next hflush/hsync triggers 
> the block compaction process. Block compaction replaces a sequence of blocks 
> with one block. From all the sequences with total length less than 4M, 
> compaction chooses the longest one. It is a greedy algorithm that preserve 
> all potential candidates for the next round. Block Compaction for WASB 
> increases data durability and allows using block blobs instead of page blobs. 
> By default, block compaction is disabled. Similar to the configuration for 
> page blobs, the client needs to specify HDFS folders where block compaction 
> over block blobs is enabled. 
> Results for HADOOP_14520_07.patch
> tested endpoint: fs.azure.account.key.hdfs4.blob.core.windows.net
> Tests run: 777, Failures: 0, Errors: 0, Skipped: 155



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HADOOP-14520) WASB: Block compaction for Azure Block Blobs

Reply via email to