[ 
https://issues.apache.org/jira/browse/HBASE-5898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13484856#comment-13484856
 ] 

ramkrishna.s.vasudevan commented on HBASE-5898:
-----------------------------------------------

I can attach some parts of the thread dump 
bq.Was it a bunch of threads getting same block? 
Yes
bq.The double-checked is probably better anyways but could the issue come back 
just less frequently after this patch goes in?
Am not sure.  We tried to restart the client twice still this persisted.  Later 
the RS we restarted after that we could not get this.
This thing repeats many times.  We took 3 thread dumps in a span of 2 mins
{code}
"IPC Server handler 42 on 60020" daemon prio=10 tid=0x00007f2f38f1a000 
nid=0x6c4d in Object.wait() [0x00007f2f33e4f000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        at java.lang.Object.wait(Object.java:485)
        at org.apache.hadoop.hbase.util.IdLock.getLockEntry(IdLock.java:77)
        - locked <0x00000006cc2a7178> (a 
org.apache.hadoop.hbase.util.IdLock$Entry)
        at 
org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:290)
        at 
org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.seekToDataBlock(HFileBlockIndex.java:213)
        at 
org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:455)
        at 
org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:493)
        at 
org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:242)
        at 
org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:167)
        at 
org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:54)
        at 
org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:299)
        at 
org.apache.hadoop.hbase.regionserver.KeyValueHeap.reseek(KeyValueHeap.java:244)
        at 
org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:523)
        - locked <0x000000069a665420> (a 
org.apache.hadoop.hbase.regionserver.StoreScanner)
        at 
org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:399)
        - locked <0x000000069a665420> (a 
org.apache.hadoop.hbase.regionserver.StoreScanner)
        at 
org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:127)
        at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3424)
        at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3379)
        - locked <0x000000069a7da458> (a 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl)
        at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3396)
        - locked <0x000000069a7da458> (a 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl)
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2411)
        at sun.reflect.GeneratedMethodAccessor32.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
{code}
Also we could see that sometimes the relaseLock was also happening.  But in the 
3 thread dumps this came only once.
{code}
"IPC Server handler 18 on 60020" daemon prio=10 tid=0x00007f2f38ee9800 
nid=0x6c35 runnable [0x00007f2f35667000]
   java.lang.Thread.State: RUNNABLE
        at java.lang.Object.notify(Native Method)
        at org.apache.hadoop.hbase.util.IdLock.releaseLockEntry(IdLock.java:108)
        - locked <0x00000006cc2a7178> (a 
org.apache.hadoop.hbase.util.IdLock$Entry)
        at 
org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:352)
        at 
org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.seekToDataBlock(HFileBlockIndex.java:213)
        at 
org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:455)
        at 
org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:493)
        at 
org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:242)
        at 
org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:167)
        at 
org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:54)
        at 
org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:299)
        at 
org.apache.hadoop.hbase.regionserver.KeyValueHeap.reseek(KeyValueHeap.java:244)
        at 
org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:523)
        - locked <0x000000069a89d678> (a 
org.apache.hadoop.hbase.regionserver.StoreScanner)
        at 
org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:399)
        - locked <0x000000069a89d678> (a 
org.apache.hadoop.hbase.regionserver.StoreScanner)
        at 
org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:127)
        at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3424)
        at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3379)
        - locked <0x000000069ae0bbb8> (a 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl)
        at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3396)
        - locked <0x000000069ae0bbb8> (a 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl)
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2411)

{code}
All the client threads were hanging here
{code}
"Thread-29" prio=10 tid=0x00007f9f2c549000 nid=0x639f waiting for monitor entry 
[0x00007f9f2adec000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:955)
        - waiting to lock <0x000000078ba82828> (a java.lang.Object)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:841)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:810)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:942)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:845)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:810)
        at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:232)
        at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:172)
        at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:131)

{code}
                
> Consider double-checked locking for block cache lock
> ----------------------------------------------------
>
>                 Key: HBASE-5898
>                 URL: https://issues.apache.org/jira/browse/HBASE-5898
>             Project: HBase
>          Issue Type: Improvement
>          Components: Performance
>    Affects Versions: 0.94.1
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: 5898-TestBlocksRead.txt, HBASE-5898-0.patch, 
> HBASE-5898-1.patch, hbase-5898.txt
>
>
> Running a workload with a high query rate against a dataset that fits in 
> cache, I saw a lot of CPU being used in IdLock.getLockEntry, being called by 
> HFileReaderV2.readBlock. Even though it was all cache hits, it was wasting a 
> lot of CPU doing lock management here. I wrote a quick patch to switch to a 
> double-checked locking and it improved throughput substantially for this 
> workload.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to