[ 
https://issues.apache.org/jira/browse/HBASE-19389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16420156#comment-16420156
 ] 

stack commented on HBASE-19389:
-------------------------------

[~chancelq] I just came across this bit of code. A few items. The RN says 
default is 10 but in code it is 50 (I changed the RN)? Also, we are keeping 
counters though they are to emit a log IFF trace is enabled. Should we not 
rather not keep counters unless trace is enabled? For example, I think the code 
should look like this:

{code}
@@ -705,10 +722,10 @@ public class HStore implements Store, HeapSize, 
StoreConfigInformation, Propagat
   public void add(final Cell cell, MemStoreSizing memstoreSizing) {
     lock.readLock().lock();
     try {
-      if (this.currentParallelPutCount.getAndIncrement() > 
this.parallelPutCountPrintThreshold) {
-        if (LOG.isTraceEnabled()) {
-          LOG.trace(this.getTableName() + ":" + 
this.getRegionInfo().getEncodedName() + ":" + this
-              .getColumnFamilyName() + " too Busy!");
+      if (LOG.isTraceEnabled()) {
+        if (this.currentParallelPutCount.getAndIncrement() > 
this.parallelPutCountPrintThreshold) {
+          LOG.trace(this.getTableName() + "tableName={}, encodedName={}, 
columnFamilyName={} is " +
+              "too busy!", this.getRegionInfo().getEncodedName(), this 
.getColumnFamilyName());
         }
       }
{code}

Another thing is that I see this as a useful debugging tool. An operator is 
concerned that there is too much parallelism going on... that clients are 
backed up on Cell or a Store but they don't know which one. If they dynamically 
enable TRACE-level on HStore, then if there are Stores with parallelism in 
excess of the threshold here, then they'll see a log identifying the victim. Is 
this how you folks used it? If so, I'll add to the RN and make an addition to 
the refguide in a follow-on.

Good stuff lads.

> Limit concurrency of put with dense (hundreds) columns to prevent write 
> handler exhausted
> -----------------------------------------------------------------------------------------
>
>                 Key: HBASE-19389
>                 URL: https://issues.apache.org/jira/browse/HBASE-19389
>             Project: HBase
>          Issue Type: Improvement
>          Components: Performance
>    Affects Versions: 2.0.0
>         Environment: 2000+ Region Servers
> PCI-E ssd
>            Reporter: Chance Li
>            Assignee: Chance Li
>            Priority: Critical
>             Fix For: 3.0.0, 2.1.0
>
>         Attachments: CSLM-concurrent-write.png, 
> HBASE-19389-branch-2-V10.patch, HBASE-19389-branch-2-V2.patch, 
> HBASE-19389-branch-2-V3.patch, HBASE-19389-branch-2-V4.patch, 
> HBASE-19389-branch-2-V5.patch, HBASE-19389-branch-2-V6.patch, 
> HBASE-19389-branch-2-V7.patch, HBASE-19389-branch-2-V8.patch, 
> HBASE-19389-branch-2-V9.patch, HBASE-19389-branch-2.patch, 
> HBASE-19389.master.patch, HBASE-19389.master.v2.patch, metrics-1.png, 
> ycsb-result.png
>
>
> In a large cluster, with a large number of clients, we found the RS's 
> handlers are all busy sometimes. And after investigation we found the root 
> cause is about CSLM, such as compare function heavy load. We reviewed the 
> related WALs, and found that there were many columns (more than 1000 columns) 
> were writing at that time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to