[jira] [Commented] (HBASE-16698) Performance issue: handlers stuck waiting for CountDownLatch inside WALKey#getWriteEntry under high writing workload

stack (JIRA) Thu, 13 Oct 2016 17:29:31 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-16698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15573654#comment-15573654
 ]


stack commented on HBASE-16698:
-------------------------------

Committed below addendum to address this FindBugs complaint:


Code    Warning
UL      
org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion$BatchOperationInProgress)
 does not release lock on all paths
Bug type UL_UNRELEASED_LOCK (click for details) 
In class org.apache.hadoop.hbase.regionserver.HRegion
In method 
org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion$BatchOperationInProgress)
At HRegion.java:[line 3313]


stack-MBP:hbase stack$ git show -1
commit e1923b7c0c14b435ea0d9eb306d968f1927a0c6e
Author: Michael Stack <st...@apache.org>
Date:   Thu Oct 13 17:16:47 2016 -0700

    HBASE-16698 Performance issue: handlers stuck waiting for CountDownLatch 
inside WALKey#getWriteEntry under high writing workload; ADDENDUM. Fix findbugs

diff --git 
a/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 
b/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
index 3715ca1..a486599 100644
--- 
a/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
+++ 
b/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
@@ -3310,7 +3310,6 @@ public class HRegion implements HeapSize, 
PropagatingConfigurationObserver, Regi
         this.mvcc.advanceTo(batchOp.getReplaySequenceId());
       } else {
         // writeEntry won't be empty if not in replay mode
-        assert writeEntry != null;
         mvcc.completeAndWait(writeEntry);
         writeEntry = null;
       }

@appy kicked me for committing w/ a FindBugs....  (Thanks @appy)


> Performance issue: handlers stuck waiting for CountDownLatch inside 
> WALKey#getWriteEntry under high writing workload
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-16698
>                 URL: https://issues.apache.org/jira/browse/HBASE-16698
>             Project: HBase
>          Issue Type: Improvement
>          Components: Performance
>    Affects Versions: 1.1.6, 1.2.3
>            Reporter: Yu Li
>            Assignee: Yu Li
>             Fix For: 2.0.0
>
>         Attachments: HBASE-16698.branch-1.patch, HBASE-16698.patch, 
> HBASE-16698.v2.patch, hadoop0495.et2.jstack
>
>
> As titled, on our production environment we observed 98 out of 128 handlers 
> get stuck waiting for the CountDownLatch {{seqNumAssignedLatch}} inside 
> {{WALKey#getWriteEntry}} under a high writing workload.
> After digging into the problem, we found that the problem is mainly caused by 
> advancing mvcc in the append logic. Below is some detailed analysis:
> Under current branch-1 code logic, all batch puts will call 
> {{WALKey#getWriteEntry}} after appending edit to WAL, and 
> {{seqNumAssignedLatch}} is only released when the relative append call is 
> handled by RingBufferEventHandler (see {{FSWALEntry#stampRegionSequenceId}}). 
> Because currently we're using a single event handler for the ringbuffer, the 
> append calls are handled one by one (actually lot's of our current logic 
> depending on this sequential dealing logic), and this becomes a bottleneck 
> under high writing workload.
> The worst part is that by default we only use one WAL per RS, so appends on 
> all regions are dealt with in sequential, which causes contention among 
> different regions...
> To fix this, we could also take use of the "sequential appends" mechanism, 
> that we could grab the WriteEntry before publishing append onto ringbuffer 
> and use it as sequence id, only that we need to add a lock to make "grab 
> WriteEntry" and "append edit" a transaction. This will still cause contention 
> inside a region but could avoid contention between different regions. This 
> solution is already verified in our online environment and proved to be 
> effective.
> Notice that for master (2.0) branch since we already change the write 
> pipeline to sync before writing memstore (HBASE-15158), this issue only 
> exists for the ASYNC_WAL writes scenario.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16698) Performance issue: handlers stuck waiting for CountDownLatch inside WALKey#getWriteEntry under high writing workload

Reply via email to