[ 
https://issues.apache.org/jira/browse/HBASE-16960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-16960:
--------------------------
    Attachment: 16960.ut.missing.final.piece.txt

Add means of providing own event handler. Add a handler that unsets batching 
flag and that can throw the socket timeout seen in this issue when asked to.  
Add a wal roll listener that will run the log roll in a new thread so we don't 
block progress. 

This is what I'm getting which is NOT what @binlijin pasted above:

{code}
2016-10-31 16:51:14,362 ERROR [walroller] wal.FSHLog(406): Failed close of WAL 
writer 
hdfs://localhost:49209/user/stack/test-data/c4949222-6bc9-417c-9d7c-b361315bfb1d/testStuckAfterAppendException/wal.1477957874194,
 unflushedEntries=3
org.apache.hadoop.hbase.regionserver.wal.FailedSyncBeforeLogCloseException: 
org.apache.hadoop.hbase.regionserver.wal.DamagedWALException: On sync
        at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog$SafePointZigZagLatch.waitSafePoint(FSHLog.java:899)
        at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog.doReplaceWriter(FSHLog.java:365)
        at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog.doReplaceWriter(FSHLog.java:74)
        at 
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.replaceWriter(AbstractFSWAL.java:641)
        at 
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(AbstractFSWAL.java:708)
        at 
org.apache.hadoop.hbase.regionserver.wal.TestFSHLog$WALRoller$1.run(TestFSHLog.java:194)
Caused by: org.apache.hadoop.hbase.regionserver.wal.DamagedWALException: On sync
        at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1101)
        at 
org.apache.hadoop.hbase.regionserver.wal.TestFSHLog$BatchManipulatingRingBufferEventHandler.onEvent(TestFSHLog.java:240)
        at 
org.apache.hadoop.hbase.regionserver.wal.TestFSHLog$BatchManipulatingRingBufferEventHandler.onEvent(TestFSHLog.java:225)
        at 
com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:128)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.SocketTimeoutException: Faked Append Exception!!!!
        at 
org.apache.hadoop.hbase.regionserver.wal.TestFSHLog$BatchManipulatingRingBufferEventHandler.append(TestFSHLog.java:247)
        at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1051)
        ... 6 more
...

{code}

I'm throwing exception at wrong point. Will work more on this ONLY if wanted.

> RegionServer hang when aborting
> -------------------------------
>
>                 Key: HBASE-16960
>                 URL: https://issues.apache.org/jira/browse/HBASE-16960
>             Project: HBase
>          Issue Type: Bug
>            Reporter: binlijin
>            Assignee: binlijin
>         Attachments: 16960.ut.missing.final.piece.txt, HBASE-16960.patch, 
> HBASE-16960_master_v2.patch, HBASE-16960_master_v3.patch, 
> RingBufferEventHandler.png, RingBufferEventHandler_exception.png, 
> SyncFuture.png, SyncFuture_exception.png, rs1081.jstack
>
>
> We see regionserver hang when aborting several times and cause all regions on 
> this regionserver out of service and then all affected applications stop 
> works.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to