[ 
https://issues.apache.org/jira/browse/HBASE-11145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-11145:
--------------------------
    Attachment: 11145v2.txt

So, added [~anoop.hbase] suggested cleanup when queue is full so we should no 
longer get the secondary index out of bounds.  Also tripled the queue size so 
it is three times user handler count.  In this last exception noted above, the 
sync seems to be run from a meta region open handler.  Meta region open 
handlers are doing syncs.  Our Q is only the size of the user-level  handlers 
so this would explain why our count is off: i.e. all user-level handlers are 
filled and the meta tries to sync (still seems a little odd that one of the 
syncer threads would get into this state -- maybe fixing this will reveal 
actual problem....perhaps it is possible when meta is not yet online...).

Review [~anoop.hbase] please?  Thanks.

> UNEXPECTED!!! when HLog sync: Queue full
> ----------------------------------------
>
>                 Key: HBASE-11145
>                 URL: https://issues.apache.org/jira/browse/HBASE-11145
>             Project: HBase
>          Issue Type: Bug
>          Components: wal
>            Reporter: Anoop Sam John
>            Assignee: stack
>            Priority: Critical
>             Fix For: 0.99.1
>
>         Attachments: 11145.txt, 11145v2.txt
>
>
> Got the below Exceptions Log in case of a write heavy test
> {code}
> 2014-05-07 11:29:56,417 ERROR [main.append-pool1-t1] 
> wal.FSHLog$RingBufferEventHandler(1882): UNEXPECTED!!!
> java.lang.IllegalStateException: Queue full
>  at java.util.AbstractQueue.add(Unknown Source)
>  at 
> org.apache.hadoop.hbase.regionserver.wal.FSHLog$SyncRunner.offer(FSHLog.java:1227)
>  at 
> org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1878)
>  at 
> org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1)
>  at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:133)
>  at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
>  at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>  at java.lang.Thread.run(Unknown Source)
> 2014-05-07 11:29:56,418 ERROR [main.append-pool1-t1] 
> wal.FSHLog$RingBufferEventHandler(1882): UNEXPECTED!!!
> java.lang.ArrayIndexOutOfBoundsException: 5
>  at 
> org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1838)
>  at 
> org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1)
>  at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:133)
>  at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
>  at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>  at java.lang.Thread.run(Unknown Source)
> 2014-05-07 11:29:56,419 ERROR [main.append-pool1-t1] 
> wal.FSHLog$RingBufferEventHandler(1882): UNEXPECTED!!!
> java.lang.ArrayIndexOutOfBoundsException: 6
>  at 
> org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1838)
>  at 
> org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1)
>  at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:133)
>  at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
>  at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>  at java.lang.Thread.run(Unknown Source)
> 2014-05-07 11:29:56,419 ERROR [main.append-pool1-t1] 
> wal.FSHLog$RingBufferEventHandler(1882): UNEXPECTED!!!
> java.lang.ArrayIndexOutOfBoundsException: 7
>  at 
> org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1838)
>  at 
> org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1)
>  at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:133)
>  at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
>  at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>  at java.lang.Thread.run(Unknown Source)
>  {code}
> In FSHLog$SyncRunner.offer we do BlockingQueue.add() which throws Exception 
> as it is full. The problem is the below shown catch() we do not do any 
> cleanup.
> {code}
> this.syncRunners[index].offer(sequence, this.syncFutures, 
> this.syncFuturesCount);
>         attainSafePoint(sequence);
>         this.syncFuturesCount = 0;
>       } catch (Throwable t) {
>         LOG.error("UNEXPECTED!!!", t);
>       }
> {code}
> syncFuturesCount is not getting reset to 0 and so the subsequent onEvent() 
> handling throws ArrayIndexOutOfBoundsException.
> I think we should do the below 
> 1. Handle the Exception and call cleanupOutstandingSyncsOnException() as in 
> other cases of Exception handling
> 2. Instead of BlockingQueue.add() use offer() (?)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to