[
https://issues.apache.org/jira/browse/HBASE-14463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14903082#comment-14903082
]
Yu Li commented on HBASE-14463:
-------------------------------
Thanks for the information about HBASE-13903 [~mbertozzi]. :-)
Yes, we almost focused on the same code segment, and the phenomenon is similar
too. The only difference is that in our case I saw lots of handler waiting for
the entry lock to be released(existing.wait) instead of map.putIfAbsent. Below
is the jstack I got while encountering the online issue:
{noformat}
"B.DefaultRpcServer.handler=127,queue=10,port=60020" daemon prio=10
tid=0x00007f7556bda800 nid=0x123b1 in Object.wait() [0x00000000449ae000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:503)
at org.apache.hadoop.hbase.util.IdLock.getLockEntry(IdLock.java:79)
- locked <0x000000017e2e0980> (a
org.apache.hadoop.hbase.util.IdLock$Entry)
at
org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.getBlock(BucketCache.java:413)
at
org.apache.hadoop.hbase.io.hfile.CombinedBlockCache.getBlock(CombinedBlockCache.java:77)
at
org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:360)
{noformat}
And this is the relative code:
{code}
73 while ((existing = map.putIfAbsent(entry.id, entry)) != null) {
74 synchronized (existing) {
75 if (existing.isLocked) {
76 ++existing.numWaiters; // Add ourselves to waiters.
77 while (existing.isLocked) {
78 try {
79 existing.wait();
80 } catch (InterruptedException e) {
{code}
Regarding move IdLock to be a write-only readwrite lock, I thought about using
IdReadWriteLock to fully replace IdLock but saw reference from MobFileCache
besides BucketCache. I guess it could also benefit since file open/close cost
is also expensive but not that sure since I never tried MobFileCache in real
env. Also I'm not sure whether we shouldn't evict the block while still having
threads reading it, if so I guess we still need read+write lock.
[[email protected]] could you give some comments here about the
MobFileCache case?
> Severe performance downgrade when parallel reading a single key from
> BucketCache
> --------------------------------------------------------------------------------
>
> Key: HBASE-14463
> URL: https://issues.apache.org/jira/browse/HBASE-14463
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.98.14, 1.1.2
> Reporter: Yu Li
> Assignee: Yu Li
> Fix For: 2.0.0, 1.3.0
>
> Attachments: HBASE-14463.patch, TestBucketCache_with_IdLock.png,
> TestBucketCache_with_IdReadWriteLock.png
>
>
> We store feature data of online items in HBase, do machine learning on these
> features, and supply the outputs to our online search engine. In such
> scenario we will launch hundreds of yarn workers and each worker will read
> all features of one item(i.e. single rowkey in HBase), so there'll be heavy
> parallel reading on a single rowkey.
> We were using LruCache but start to try BucketCache recently to resolve gc
> issue, and just as titled we have observed severe performance downgrade.
> After some analytics we found the root cause is the lock in
> BucketCache#getBlock, as shown below
> {code}
> try {
> lockEntry = offsetLock.getLockEntry(bucketEntry.offset());
> // ...
> if (bucketEntry.equals(backingMap.get(key))) {
> // ...
> int len = bucketEntry.getLength();
> Cacheable cachedBlock = ioEngine.read(bucketEntry.offset(), len,
> bucketEntry.deserializerReference(this.deserialiserMap));
> {code}
> Since ioEnging.read involves array copy, it's much more time-costed than the
> operation in LruCache. And since we're using synchronized in
> IdLock#getLockEntry, parallel read dropping on the same bucket would be
> executed in serial, which causes a really bad performance.
> To resolve the problem, we propose to use ReentranceReadWriteLock in
> BucketCache, and introduce a new class called IdReadWriteLock to implement it.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)