[ 
https://issues.apache.org/jira/browse/HADOOP-12403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15794782#comment-15794782
 ] 

Hadoop QA commented on HADOOP-12403:
------------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  4s{color} 
| {color:red} HADOOP-12403 does not apply to trunk. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HADOOP-12403 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12755309/HADOOP-12403.03.patch 
|
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/11342/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Enable multiple writes in flight for HBase WAL writing backed by WASB
> ---------------------------------------------------------------------
>
>                 Key: HADOOP-12403
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12403
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs/azure
>            Reporter: Duo Xu
>            Assignee: Duo Xu
>         Attachments: HADOOP-12403.01.patch, HADOOP-12403.02.patch, 
> HADOOP-12403.03.patch
>
>
> Azure HDI HBase clusters use Azure blob storage as file system. We found that 
> the bottle neck was during writing to write ahead log (WAL). The latest HBase 
> WAL write model (HBASE-8755) uses multiple AsyncSyncer threads to sync data 
> to HDFS. However, our WASB driver is still based on a single thread model. 
> Thus when the sync threads call into WASB layer, every time only one thread 
> will be allowed to send data to Azure storage.This jira is to introduce a new 
> write model in WASB layer to allow multiple writes in parallel.
> 1. Since We use page blob for WAL, this will cause "holes" in the page blob 
> as every write starts on a new page. We use the first two bytes of every page 
> to record the actual data size of the current page.
> 2. When reading WAL, we need to know the actual size of the WAL. This should 
> be the sum of the number represented by the first two bytes of every page. 
> However looping over every page to get the size will be very slow, 
> considering normal WAL size is 128MB and each page is 512 bytes. So during 
> writing, every time a write succeeds, a metadata of the blob called 
> "total_data_uploaded" will be updated.
> 3. Although we allow multiple writes in flight, we need to make sure the sync 
> threads which call into WASB layers return in order. Reading HBase source 
> code FSHLog.java, we find that every sync request is associated with a 
> transaction id. If the sync succeeds, all the transactions prior to this 
> transaction id are assumed to be in Azure Storage. We use a queue to store 
> the sync requests and make sure they return to HBase layer in order.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to