[
https://issues.apache.org/jira/browse/HBASE-11145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
stack updated HBASE-11145:
--------------------------
Attachment: 11145v2.txt
So, added [~anoop.hbase] suggested cleanup when queue is full so we should no
longer get the secondary index out of bounds. Also tripled the queue size so
it is three times user handler count. In this last exception noted above, the
sync seems to be run from a meta region open handler. Meta region open
handlers are doing syncs. Our Q is only the size of the user-level handlers
so this would explain why our count is off: i.e. all user-level handlers are
filled and the meta tries to sync (still seems a little odd that one of the
syncer threads would get into this state -- maybe fixing this will reveal
actual problem....perhaps it is possible when meta is not yet online...).
Review [~anoop.hbase] please? Thanks.
> UNEXPECTED!!! when HLog sync: Queue full
> ----------------------------------------
>
> Key: HBASE-11145
> URL: https://issues.apache.org/jira/browse/HBASE-11145
> Project: HBase
> Issue Type: Bug
> Components: wal
> Reporter: Anoop Sam John
> Assignee: stack
> Priority: Critical
> Fix For: 0.99.1
>
> Attachments: 11145.txt, 11145v2.txt
>
>
> Got the below Exceptions Log in case of a write heavy test
> {code}
> 2014-05-07 11:29:56,417 ERROR [main.append-pool1-t1]
> wal.FSHLog$RingBufferEventHandler(1882): UNEXPECTED!!!
> java.lang.IllegalStateException: Queue full
> at java.util.AbstractQueue.add(Unknown Source)
> at
> org.apache.hadoop.hbase.regionserver.wal.FSHLog$SyncRunner.offer(FSHLog.java:1227)
> at
> org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1878)
> at
> org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1)
> at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:133)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> at java.lang.Thread.run(Unknown Source)
> 2014-05-07 11:29:56,418 ERROR [main.append-pool1-t1]
> wal.FSHLog$RingBufferEventHandler(1882): UNEXPECTED!!!
> java.lang.ArrayIndexOutOfBoundsException: 5
> at
> org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1838)
> at
> org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1)
> at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:133)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> at java.lang.Thread.run(Unknown Source)
> 2014-05-07 11:29:56,419 ERROR [main.append-pool1-t1]
> wal.FSHLog$RingBufferEventHandler(1882): UNEXPECTED!!!
> java.lang.ArrayIndexOutOfBoundsException: 6
> at
> org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1838)
> at
> org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1)
> at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:133)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> at java.lang.Thread.run(Unknown Source)
> 2014-05-07 11:29:56,419 ERROR [main.append-pool1-t1]
> wal.FSHLog$RingBufferEventHandler(1882): UNEXPECTED!!!
> java.lang.ArrayIndexOutOfBoundsException: 7
> at
> org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1838)
> at
> org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1)
> at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:133)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> at java.lang.Thread.run(Unknown Source)
> {code}
> In FSHLog$SyncRunner.offer we do BlockingQueue.add() which throws Exception
> as it is full. The problem is the below shown catch() we do not do any
> cleanup.
> {code}
> this.syncRunners[index].offer(sequence, this.syncFutures,
> this.syncFuturesCount);
> attainSafePoint(sequence);
> this.syncFuturesCount = 0;
> } catch (Throwable t) {
> LOG.error("UNEXPECTED!!!", t);
> }
> {code}
> syncFuturesCount is not getting reset to 0 and so the subsequent onEvent()
> handling throws ArrayIndexOutOfBoundsException.
> I think we should do the below
> 1. Handle the Exception and call cleanupOutstandingSyncsOnException() as in
> other cases of Exception handling
> 2. Instead of BlockingQueue.add() use offer() (?)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)