[
https://issues.apache.org/jira/browse/HBASE-19389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16420156#comment-16420156
]
stack commented on HBASE-19389:
-------------------------------
[~chancelq] I just came across this bit of code. A few items. The RN says
default is 10 but in code it is 50 (I changed the RN)? Also, we are keeping
counters though they are to emit a log IFF trace is enabled. Should we not
rather not keep counters unless trace is enabled? For example, I think the code
should look like this:
{code}
@@ -705,10 +722,10 @@ public class HStore implements Store, HeapSize,
StoreConfigInformation, Propagat
public void add(final Cell cell, MemStoreSizing memstoreSizing) {
lock.readLock().lock();
try {
- if (this.currentParallelPutCount.getAndIncrement() >
this.parallelPutCountPrintThreshold) {
- if (LOG.isTraceEnabled()) {
- LOG.trace(this.getTableName() + ":" +
this.getRegionInfo().getEncodedName() + ":" + this
- .getColumnFamilyName() + " too Busy!");
+ if (LOG.isTraceEnabled()) {
+ if (this.currentParallelPutCount.getAndIncrement() >
this.parallelPutCountPrintThreshold) {
+ LOG.trace(this.getTableName() + "tableName={}, encodedName={},
columnFamilyName={} is " +
+ "too busy!", this.getRegionInfo().getEncodedName(), this
.getColumnFamilyName());
}
}
{code}
Another thing is that I see this as a useful debugging tool. An operator is
concerned that there is too much parallelism going on... that clients are
backed up on Cell or a Store but they don't know which one. If they dynamically
enable TRACE-level on HStore, then if there are Stores with parallelism in
excess of the threshold here, then they'll see a log identifying the victim. Is
this how you folks used it? If so, I'll add to the RN and make an addition to
the refguide in a follow-on.
Good stuff lads.
> Limit concurrency of put with dense (hundreds) columns to prevent write
> handler exhausted
> -----------------------------------------------------------------------------------------
>
> Key: HBASE-19389
> URL: https://issues.apache.org/jira/browse/HBASE-19389
> Project: HBase
> Issue Type: Improvement
> Components: Performance
> Affects Versions: 2.0.0
> Environment: 2000+ Region Servers
> PCI-E ssd
> Reporter: Chance Li
> Assignee: Chance Li
> Priority: Critical
> Fix For: 3.0.0, 2.1.0
>
> Attachments: CSLM-concurrent-write.png,
> HBASE-19389-branch-2-V10.patch, HBASE-19389-branch-2-V2.patch,
> HBASE-19389-branch-2-V3.patch, HBASE-19389-branch-2-V4.patch,
> HBASE-19389-branch-2-V5.patch, HBASE-19389-branch-2-V6.patch,
> HBASE-19389-branch-2-V7.patch, HBASE-19389-branch-2-V8.patch,
> HBASE-19389-branch-2-V9.patch, HBASE-19389-branch-2.patch,
> HBASE-19389.master.patch, HBASE-19389.master.v2.patch, metrics-1.png,
> ycsb-result.png
>
>
> In a large cluster, with a large number of clients, we found the RS's
> handlers are all busy sometimes. And after investigation we found the root
> cause is about CSLM, such as compare function heavy load. We reviewed the
> related WALs, and found that there were many columns (more than 1000 columns)
> were writing at that time.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)