[
https://issues.apache.org/jira/browse/HADOOP-12403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15794782#comment-15794782
]
Hadoop QA commented on HADOOP-12403:
------------------------------------
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 4s{color}
| {color:red} HADOOP-12403 does not apply to trunk. Rebase required? Wrong
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HADOOP-12403 |
| JIRA Patch URL |
https://issues.apache.org/jira/secure/attachment/12755309/HADOOP-12403.03.patch
|
| Console output |
https://builds.apache.org/job/PreCommit-HADOOP-Build/11342/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org |
This message was automatically generated.
> Enable multiple writes in flight for HBase WAL writing backed by WASB
> ---------------------------------------------------------------------
>
> Key: HADOOP-12403
> URL: https://issues.apache.org/jira/browse/HADOOP-12403
> Project: Hadoop Common
> Issue Type: Improvement
> Components: fs/azure
> Reporter: Duo Xu
> Assignee: Duo Xu
> Attachments: HADOOP-12403.01.patch, HADOOP-12403.02.patch,
> HADOOP-12403.03.patch
>
>
> Azure HDI HBase clusters use Azure blob storage as file system. We found that
> the bottle neck was during writing to write ahead log (WAL). The latest HBase
> WAL write model (HBASE-8755) uses multiple AsyncSyncer threads to sync data
> to HDFS. However, our WASB driver is still based on a single thread model.
> Thus when the sync threads call into WASB layer, every time only one thread
> will be allowed to send data to Azure storage.This jira is to introduce a new
> write model in WASB layer to allow multiple writes in parallel.
> 1. Since We use page blob for WAL, this will cause "holes" in the page blob
> as every write starts on a new page. We use the first two bytes of every page
> to record the actual data size of the current page.
> 2. When reading WAL, we need to know the actual size of the WAL. This should
> be the sum of the number represented by the first two bytes of every page.
> However looping over every page to get the size will be very slow,
> considering normal WAL size is 128MB and each page is 512 bytes. So during
> writing, every time a write succeeds, a metadata of the blob called
> "total_data_uploaded" will be updated.
> 3. Although we allow multiple writes in flight, we need to make sure the sync
> threads which call into WASB layers return in order. Reading HBase source
> code FSHLog.java, we find that every sync request is associated with a
> transaction id. If the sync succeeds, all the transactions prior to this
> transaction id are assumed to be in Azure Storage. We use a queue to store
> the sync requests and make sure they return to HBase layer in order.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]