[ 
https://issues.apache.org/jira/browse/HBASE-2957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906312#action_12906312
 ] 

dhruba borthakur commented on HBASE-2957:
-----------------------------------------

I am seeing a typical use case of hbase where all the rows of
a table are not equally hot. A few rows are orders of magnitude
hotter than most other rows.

Each get/put operation in hbase involes the following:
{code}
 put operation                           get operation
 --------------------------------------------------------------
1. acquire the rowlock
2. append to hlog
3. update memstore                 read from memstore
4. release rowlock
{code}

For example, if the appliction workload consists of only increment operations 
on *one* record, then
the entire workload is serialized and the throughout is purely dependent on the
speed of the append-hlog operation. The number of hlog.append calls is
precisely the same as the number of put calls. This can be slow, especially
because the append operation requires writing to three datanodes in hdfs.

We can make the workload supertfast while keeping the same data consistency
guarantees if we can achieve some batching. For
each record, let's say that the memstore contains a version of the record that 
has been committed to
hlog and another version of the same record that is being updated in memory
but has not yet been committed to hlog. let's say that we refer to these two 
versions
of the record as "memstore.inflight" and "memstore.committed" versions.

{code}
 put operation                                        get operation
 
----------------------------------------------------------------------------------
1. acquire the rowlock
2. update memstore.inflight                   read memstores.committed
3. release rowlock
3. append to hlog
4. memstore.committed = memstore.inflight

{code}

The key to the above protocol is that the rowlock is released as soon
as memstore is updated. This means that multiple calls to put() for
the same record will be parallelized and would result in a fewer calls
to hlog.append.

Do people think that this is feasible and beneficial? If so, I can delve deeper 
into the design and implementation of this performance improvement.

> Release row lock when waiting for wal-sync
> ------------------------------------------
>
>                 Key: HBASE-2957
>                 URL: https://issues.apache.org/jira/browse/HBASE-2957
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver, wal
>    Affects Versions: 0.20.0
>            Reporter: Prakash Khemani
>
> Is there a reason to hold on to the row-lock while waiting for the WAL-sync 
> to be completed by the logSyncer thread?
> I think data consistency will be guaranteed even if the following happens (a) 
> the row lock is held while the row is updated in memory (b) the row lock is 
> released after queuing the KV record for WAL-syncing (c) the log-sync system 
> guarantees that the log records for any given row are synced in order (d) the 
> HBase client only receives a success notification after the sync completes 
> (no change from the current state)
> I think this should be a huge win. For my use case, and I am sure for others, 
>  the handler thread spends the bulk of its row-lock critical section  time 
> waiting for sync to complete.
> Even if the log-sync system cannot guarantee the orderly completion of sync 
> records, the "Don't hold row lock while waiting for sync" option should be 
> available to HBase clients on a per request basis.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to