[ https://issues.apache.org/jira/browse/HBASE-19389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16420156#comment-16420156 ]
stack commented on HBASE-19389: ------------------------------- [~chancelq] I just came across this bit of code. A few items. The RN says default is 10 but in code it is 50 (I changed the RN)? Also, we are keeping counters though they are to emit a log IFF trace is enabled. Should we not rather not keep counters unless trace is enabled? For example, I think the code should look like this: {code} @@ -705,10 +722,10 @@ public class HStore implements Store, HeapSize, StoreConfigInformation, Propagat public void add(final Cell cell, MemStoreSizing memstoreSizing) { lock.readLock().lock(); try { - if (this.currentParallelPutCount.getAndIncrement() > this.parallelPutCountPrintThreshold) { - if (LOG.isTraceEnabled()) { - LOG.trace(this.getTableName() + ":" + this.getRegionInfo().getEncodedName() + ":" + this - .getColumnFamilyName() + " too Busy!"); + if (LOG.isTraceEnabled()) { + if (this.currentParallelPutCount.getAndIncrement() > this.parallelPutCountPrintThreshold) { + LOG.trace(this.getTableName() + "tableName={}, encodedName={}, columnFamilyName={} is " + + "too busy!", this.getRegionInfo().getEncodedName(), this .getColumnFamilyName()); } } {code} Another thing is that I see this as a useful debugging tool. An operator is concerned that there is too much parallelism going on... that clients are backed up on Cell or a Store but they don't know which one. If they dynamically enable TRACE-level on HStore, then if there are Stores with parallelism in excess of the threshold here, then they'll see a log identifying the victim. Is this how you folks used it? If so, I'll add to the RN and make an addition to the refguide in a follow-on. Good stuff lads. > Limit concurrency of put with dense (hundreds) columns to prevent write > handler exhausted > ----------------------------------------------------------------------------------------- > > Key: HBASE-19389 > URL: https://issues.apache.org/jira/browse/HBASE-19389 > Project: HBase > Issue Type: Improvement > Components: Performance > Affects Versions: 2.0.0 > Environment: 2000+ Region Servers > PCI-E ssd > Reporter: Chance Li > Assignee: Chance Li > Priority: Critical > Fix For: 3.0.0, 2.1.0 > > Attachments: CSLM-concurrent-write.png, > HBASE-19389-branch-2-V10.patch, HBASE-19389-branch-2-V2.patch, > HBASE-19389-branch-2-V3.patch, HBASE-19389-branch-2-V4.patch, > HBASE-19389-branch-2-V5.patch, HBASE-19389-branch-2-V6.patch, > HBASE-19389-branch-2-V7.patch, HBASE-19389-branch-2-V8.patch, > HBASE-19389-branch-2-V9.patch, HBASE-19389-branch-2.patch, > HBASE-19389.master.patch, HBASE-19389.master.v2.patch, metrics-1.png, > ycsb-result.png > > > In a large cluster, with a large number of clients, we found the RS's > handlers are all busy sometimes. And after investigation we found the root > cause is about CSLM, such as compare function heavy load. We reviewed the > related WALs, and found that there were many columns (more than 1000 columns) > were writing at that time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)