[jira] [Updated] (HADOOP-12403) Enable multiple writes in flight for HBase WAL writing backed by WASB

Duo Xu (JIRA) Tue, 03 Jan 2017 17:03:07 -0800

     [ 
https://issues.apache.org/jira/browse/HADOOP-12403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Duo Xu updated HADOOP-12403:
----------------------------
    Status: In Progress  (was: Patch Available)

> Enable multiple writes in flight for HBase WAL writing backed by WASB
> ---------------------------------------------------------------------
>
>                 Key: HADOOP-12403
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12403
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs/azure
>            Reporter: Duo Xu
>            Assignee: Duo Xu
>         Attachments: HADOOP-12403.01.patch, HADOOP-12403.02.patch, 
> HADOOP-12403.03.patch
>
>
> Azure HDI HBase clusters use Azure blob storage as file system. We found that 
> the bottle neck was during writing to write ahead log (WAL). The latest HBase 
> WAL write model (HBASE-8755) uses multiple AsyncSyncer threads to sync data 
> to HDFS. However, our WASB driver is still based on a single thread model. 
> Thus when the sync threads call into WASB layer, every time only one thread 
> will be allowed to send data to Azure storage.This jira is to introduce a new 
> write model in WASB layer to allow multiple writes in parallel.
> 1. Since We use page blob for WAL, this will cause "holes" in the page blob 
> as every write starts on a new page. We use the first two bytes of every page 
> to record the actual data size of the current page.
> 2. When reading WAL, we need to know the actual size of the WAL. This should 
> be the sum of the number represented by the first two bytes of every page. 
> However looping over every page to get the size will be very slow, 
> considering normal WAL size is 128MB and each page is 512 bytes. So during 
> writing, every time a write succeeds, a metadata of the blob called 
> "total_data_uploaded" will be updated.
> 3. Although we allow multiple writes in flight, we need to make sure the sync 
> threads which call into WASB layers return in order. Reading HBase source 
> code FSHLog.java, we find that every sync request is associated with a 
> transaction id. If the sync succeeds, all the transactions prior to this 
> transaction id are assumed to be in Azure Storage. We use a queue to store 
> the sync requests and make sure they return to HBase layer in order.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HADOOP-12403) Enable multiple writes in flight for HBase WAL writing backed by WASB

Reply via email to