[
https://issues.apache.org/jira/browse/HBASE-6561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13432839#comment-13432839
]
Lars Hofhansl commented on HBASE-6561:
--------------------------------------
Found two things:
1. Store.internalFlushCache(...) should be calling
StoreScanner.next(List<KeyValue>, int limit) - currently it does not set a
limit.(But this is not the problem).
2. With jstack I found that the code is stuck in a loop in
Memstore.MemstoreScanner.getNext(...)
{code}
Here's the relevant part of the jstack:
"IPC Server handler 6 on 60020" daemon prio=10 tid=0x00007f0574625000
nid=0x720c runnable [0x00007f05669e7000]
java.lang.Thread.State: RUNNABLE
at
org.apache.hadoop.hbase.regionserver.MemStore$MemStoreScanner.getNext(MemStore.java:726)
at
org.apache.hadoop.hbase.regionserver.MemStore$MemStoreScanner.seekInSubLists(MemStore.java:761)
- locked <0x00000000c4a8a860> (a
org.apache.hadoop.hbase.regionserver.MemStore$MemStoreScanner)
at
org.apache.hadoop.hbase.regionserver.MemStore$MemStoreScanner.reseek(MemStore.java:800)
- locked <0x00000000c4a8a860> (a
org.apache.hadoop.hbase.regionserver.MemStore$MemStoreScanner)
at
org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:54)
at
org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:299)
at
org.apache.hadoop.hbase.regionserver.KeyValueHeap.reseek(KeyValueHeap.java:244)
at
org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:522)
- eliminated <0x00000000ccb54860> (a
org.apache.hadoop.hbase.regionserver.StoreScanner)
at
org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:403)
- locked <0x00000000ccb54860> (a
org.apache.hadoop.hbase.regionserver.StoreScanner)
at
org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:127)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3459)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3406)
- locked <0x00000000c59ee610> (a
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3423)
- locked <0x00000000c59ee610> (a
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl)
at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4171)
at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4144)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:1958)
at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at
org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1389)
{code}
At the same time I find that flush cannot finish:
{code}
"regionserver60020.cacheFlusher" daemon prio=10 tid=0x00007f05749ab000
nid=0x71fe waiting for monitor entry [0x00007f05677f6000]
java.lang.Thread.State: BLOCKED (on object monitor)
at
org.apache.hadoop.hbase.regionserver.StoreScanner.updateReaders(StoreScanner.java:443)
- waiting to lock <0x00000000ccb54860> (a
org.apache.hadoop.hbase.regionserver.StoreScanner)
at
org.apache.hadoop.hbase.regionserver.Store.notifyChangedReadersObservers(Store.java:904)
at
org.apache.hadoop.hbase.regionserver.Store.updateStorefiles(Store.java:893)
at org.apache.hadoop.hbase.regionserver.Store.access$600(Store.java:107)
at
org.apache.hadoop.hbase.regionserver.Store$StoreFlusherImpl.commit(Store.java:2291)
at
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1455)
at
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1353)
at
org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1294)
at
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:406)
at
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:380)
at
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:243)
at java.lang.Thread.run(Thread.java:722)
{code}
Both StoreScanner.updateReaders and StoreScanner.reseek are synchronized.
So the problem seems to be that MemStoreScanner loops forever in getNext(...).
I took a jstack a bunch of times during execution, this always shows up.
Need to dig a bit more, I do not see a good way to deal with this, yet.
> Gets/Puts with many column send the RegionServer into an "endless" loop
> -----------------------------------------------------------------------
>
> Key: HBASE-6561
> URL: https://issues.apache.org/jira/browse/HBASE-6561
> Project: HBase
> Issue Type: Bug
> Reporter: Lars Hofhansl
> Assignee: Lars Hofhansl
> Fix For: 0.96.0, 0.94.2
>
>
> This came from the mailing this:
> We were able to replicate this behavior in a pseudo-distributed hbase
> (hbase-0.94.1) environment. We wrote a test program that creates a test
> table "MyTestTable" and populates it with random rows, then it creates a
> row with 60,000 columns and repeatedly updates it. Each column has a 18
> byte qualifier and a 50 byte value. In our tests, when we ran the
> program, we usually never got beyond 15 updates before it would flush
> for a really long time. The rows that are being updated are about 4MB
> each (minues any hbase metadata).
> It doesn't seem like it's caused by GC. I turned on gc logging, and
> didn't see any long pauses. This is the gc log during the flush.
> http://pastebin.com/vJKKXDx5
> This is the regionserver log with debug on during the same flush
> http://pastebin.com/Fh5213mg
> This is the test program we wrote.
> http://pastebin.com/aZ0k5tx2
> You should be able to just compile it, and run it against a running
> HBase cluster.
> $ java TestTable
> Carlos
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira