[
https://issues.apache.org/jira/browse/HBASE-11145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14134289#comment-14134289
]
stack commented on HBASE-11145:
-------------------------------
I just saw this in trunk build
https://builds.apache.org/job/HBase-TRUNK/5506/testReport/org.apache.hadoop.hbase.master/TestRollingRestart/testBasicRollingRestart/.
Here is log snippet:
{code}
...
2014-09-15 17:32:48,061 INFO
[asf907.gq1.ygridcore.net,57548,1410802364544-org.apache.hadoop.hbase.master.handler.DisableTableHandler$BulkDisabler-16]
master.RegionStateStore(207): Updating row
tableRestart,t\xE6\xAD\xCA3,1410802367309.27875e0987d3cef64f1ed9d4e7d27984.
with state=PENDING_CLOSE
2014-09-15 17:32:48,061 INFO
[asf907.gq1.ygridcore.net,57548,1410802364544-org.apache.hadoop.hbase.master.handler.DisableTableHandler$BulkDisabler-17]
master.RegionStateStore(207): Updating row
tableRestart,vK\xA0\xF6B,1410802367312.078a7e49e3cd6d0fdf4b41d90fb7c533. with
state=PENDING_CLOSE
2014-09-15 17:32:48,063 ERROR [RS_OPEN_META-asf907:57548-0.append-pool13-t1]
wal.FSHLog$RingBufferEventHandler(1883): UNEXPECTED!!! syncFutures.length=5
java.lang.IllegalStateException: Queue full
at java.util.AbstractQueue.add(AbstractQueue.java:98)
at
org.apache.hadoop.hbase.regionserver.wal.FSHLog$SyncRunner.offer(FSHLog.java:1215)
at
org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1879)
at
org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1798)
at
com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:133)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
2014-09-15 17:32:48,065 DEBUG
[asf907.gq1.ygridcore.net,57548,1410802364544-org.apache.hadoop.hbase.master.handler.DisableTableHandler$BulkDisabler-18]
master.AssignmentManager(1268): Starting unassign of
tableRestart,w\xB0\x94"Q,1410802367316.3d23208c70ac74d7c22ad61a6d9db6f5.
(offlining), current state: {3d23208c70ac74d7c22ad61a6d9db6f5 state=OPEN,
ts=1410802367906, server=asf907.gq1.ygridcore.net,54953,1410802364919}
2014-09-15 17:32:48,065 INFO [PriorityRpcServer.handler=7,queue=0,port=54953]
regionserver.RSRpcServices(929): Close 26dfe46c64d29c8dfe378efdbe31d81f, moving
to null
2014-09-15 17:32:48,065 DEBUG
[asf907.gq1.ygridcore.net,57548,1410802364544-org.apache.hadoop.hbase.master.handler.DisableTableHandler$BulkDisabler-19]
master.AssignmentManager(1268): Starting unassign of
tableRestart,y\x15\x87N`,1410802367319.d1983f1b70a305ffd02421b5d6ae4946.
(offlining), current state: {d1983f1b70a305ffd02421b5d6ae4946 state=OPEN,
ts=1410802367949, server=asf907.gq1.ygridcore.net,53414,1410802365106}
2014-09-15 17:32:48,065 INFO
[asf907.gq1.ygridcore.net,57548,1410802364544-org.apache.hadoop.hbase.master.handler.DisableTableHandler$BulkDisabler-18]
master.RegionStates(951): Transition {3d23208c70ac74d7c22ad61a6d9db6f5
state=OPEN, ts=1410802367906,
server=asf907.gq1.ygridcore.net,54953,1410802364919} to
{3d23208c70ac74d7c22ad61a6d9db6f5 state=PENDING_CLOSE, ts=1410802368065,
server=asf907.gq1.ygridcore.net,54953,1410802364919}
2014-09-15 17:32:48,065 INFO [PriorityRpcServer.handler=4,queue=0,port=55026]
regionserver.RSRpcServices(929): Close 1a8f62b0ca9197d9bbb923f02198eccb, moving
to null
2014-09-15 17:32:48,066 INFO
[asf907.gq1.ygridcore.net,57548,1410802364544-org.apache.hadoop.hbase.master.handler.DisableTableHandler$BulkDisabler-18]
master.RegionStateStore(207): Updating row
tableRestart,w\xB0\x94"Q,1410802367316.3d23208c70ac74d7c22ad61a6d9db6f5. with
state=PENDING_CLOSE
2014-09-15 17:32:48,065 INFO
[asf907.gq1.ygridcore.net,57548,1410802364544-org.apache.hadoop.hbase.master.handler.DisableTableHandler$BulkDisabler-19]
master.RegionStates(951): Transition {d1983f1b70a305ffd02421b5d6ae4946
state=OPEN, ts=1410802367949,
server=asf907.gq1.ygridcore.net,53414,1410802365106} to
{d1983f1b70a305ffd02421b5d6ae4946 state=PENDING_CLOSE, ts=1410802368065,
server=asf907.gq1.ygridcore.net,53414,1410802365106}
2014-09-15 17:32:48,067 INFO
[asf907.gq1.ygridcore.net,57548,1410802364544-org.apache.hadoop.hbase.master.handler.DisableTableHandler$BulkDisabler-19]
master.RegionStateStore(207): Updating row
tableRestart,y\x15\x87N`,1410802367319.d1983f1b70a305ffd02421b5d6ae4946. with
state=PENDING_CLOSE
2014-09-15 17:32:48,067 INFO [PriorityRpcServer.handler=5,queue=0,port=55026]
regionserver.RSRpcServices(929): Close a6b10ca180218b7fab4c70ca2be7cffc, moving
to null
2014-09-15 17:32:48,067 ERROR [RS_OPEN_META-asf907:57548-0.append-pool13-t1]
wal.FSHLog$RingBufferEventHandler(1883): UNEXPECTED!!! syncFutures.length=5
java.lang.IllegalStateException: Queue full
at java.util.AbstractQueue.add(AbstractQueue.java:98)
at
org.apache.hadoop.hbase.regionserver.wal.FSHLog$SyncRunner.offer(FSHLog.java:1215)
at
org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1879)
at
org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1798)
at
com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:133)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
2014-09-15 17:32:48,067 DEBUG [Thread-1] client.HBaseAdmin(995): Sleeping=
100ms, waiting for all regions to be disabled in tableRestart
2014-09-15 17:32:48,068 DEBUG [RS_CLOSE_REGION-asf907:55026-0]
handler.CloseRegionHandler(90): Processing close of
tableRestart,k$\x07\x95\xCA,1410802367278.a6b10ca180218b7fab4c70ca2be7cffc.
2014-09-15 17:32:48,067 ERROR [RS_OPEN_META-asf907:57548-0.append-pool13-t1]
wal.FSHLog$RingBufferEventHandler(1883): UNEXPECTED!!! syncFutures.length=5
java.lang.ArrayIndexOutOfBoundsException: 5
at
org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1839)
at
org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1798)
at
com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:133)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
2014-09-15 17:32:48,069 DEBUG [RS_CLOSE_REGION-asf907:55026-0] regionserve
...
{code}
We so the queue full and then index out of bounds dance.
Let me do as [~anoop.hbase] suggests.
> UNEXPECTED!!! when HLog sync: Queue full
> ----------------------------------------
>
> Key: HBASE-11145
> URL: https://issues.apache.org/jira/browse/HBASE-11145
> Project: HBase
> Issue Type: Bug
> Reporter: Anoop Sam John
> Assignee: stack
> Priority: Critical
> Fix For: 0.99.1
>
> Attachments: 11145.txt
>
>
> Got the below Exceptions Log in case of a write heavy test
> {code}
> 2014-05-07 11:29:56,417 ERROR [main.append-pool1-t1]
> wal.FSHLog$RingBufferEventHandler(1882): UNEXPECTED!!!
> java.lang.IllegalStateException: Queue full
> at java.util.AbstractQueue.add(Unknown Source)
> at
> org.apache.hadoop.hbase.regionserver.wal.FSHLog$SyncRunner.offer(FSHLog.java:1227)
> at
> org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1878)
> at
> org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1)
> at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:133)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> at java.lang.Thread.run(Unknown Source)
> 2014-05-07 11:29:56,418 ERROR [main.append-pool1-t1]
> wal.FSHLog$RingBufferEventHandler(1882): UNEXPECTED!!!
> java.lang.ArrayIndexOutOfBoundsException: 5
> at
> org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1838)
> at
> org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1)
> at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:133)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> at java.lang.Thread.run(Unknown Source)
> 2014-05-07 11:29:56,419 ERROR [main.append-pool1-t1]
> wal.FSHLog$RingBufferEventHandler(1882): UNEXPECTED!!!
> java.lang.ArrayIndexOutOfBoundsException: 6
> at
> org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1838)
> at
> org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1)
> at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:133)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> at java.lang.Thread.run(Unknown Source)
> 2014-05-07 11:29:56,419 ERROR [main.append-pool1-t1]
> wal.FSHLog$RingBufferEventHandler(1882): UNEXPECTED!!!
> java.lang.ArrayIndexOutOfBoundsException: 7
> at
> org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1838)
> at
> org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1)
> at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:133)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> at java.lang.Thread.run(Unknown Source)
> {code}
> In FSHLog$SyncRunner.offer we do BlockingQueue.add() which throws Exception
> as it is full. The problem is the below shown catch() we do not do any
> cleanup.
> {code}
> this.syncRunners[index].offer(sequence, this.syncFutures,
> this.syncFuturesCount);
> attainSafePoint(sequence);
> this.syncFuturesCount = 0;
> } catch (Throwable t) {
> LOG.error("UNEXPECTED!!!", t);
> }
> {code}
> syncFuturesCount is not getting reset to 0 and so the subsequent onEvent()
> handling throws ArrayIndexOutOfBoundsException.
> I think we should do the below
> 1. Handle the Exception and call cleanupOutstandingSyncsOnException() as in
> other cases of Exception handling
> 2. Instead of BlockingQueue.add() use offer() (?)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)