[
https://issues.apache.org/jira/browse/HADOOP-12403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Duo Xu updated HADOOP-12403:
----------------------------
Status: In Progress (was: Patch Available)
> Enable multiple writes in flight for HBase WAL writing backed by WASB
> ---------------------------------------------------------------------
>
> Key: HADOOP-12403
> URL: https://issues.apache.org/jira/browse/HADOOP-12403
> Project: Hadoop Common
> Issue Type: Improvement
> Components: fs/azure
> Reporter: Duo Xu
> Assignee: Duo Xu
> Attachments: HADOOP-12403.01.patch, HADOOP-12403.02.patch,
> HADOOP-12403.03.patch
>
>
> Azure HDI HBase clusters use Azure blob storage as file system. We found that
> the bottle neck was during writing to write ahead log (WAL). The latest HBase
> WAL write model (HBASE-8755) uses multiple AsyncSyncer threads to sync data
> to HDFS. However, our WASB driver is still based on a single thread model.
> Thus when the sync threads call into WASB layer, every time only one thread
> will be allowed to send data to Azure storage.This jira is to introduce a new
> write model in WASB layer to allow multiple writes in parallel.
> 1. Since We use page blob for WAL, this will cause "holes" in the page blob
> as every write starts on a new page. We use the first two bytes of every page
> to record the actual data size of the current page.
> 2. When reading WAL, we need to know the actual size of the WAL. This should
> be the sum of the number represented by the first two bytes of every page.
> However looping over every page to get the size will be very slow,
> considering normal WAL size is 128MB and each page is 512 bytes. So during
> writing, every time a write succeeds, a metadata of the blob called
> "total_data_uploaded" will be updated.
> 3. Although we allow multiple writes in flight, we need to make sure the sync
> threads which call into WASB layers return in order. Reading HBase source
> code FSHLog.java, we find that every sync request is associated with a
> transaction id. If the sync succeeds, all the transactions prior to this
> transaction id are assumed to be in Azure Storage. We use a queue to store
> the sync requests and make sure they return to HBase layer in order.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]