[
https://issues.apache.org/jira/browse/HBASE-16960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
stack updated HBASE-16960:
--------------------------
Attachment: 16960.ut.missing.final.piece.txt
Add means of providing own event handler. Add a handler that unsets batching
flag and that can throw the socket timeout seen in this issue when asked to.
Add a wal roll listener that will run the log roll in a new thread so we don't
block progress.
This is what I'm getting which is NOT what @binlijin pasted above:
{code}
2016-10-31 16:51:14,362 ERROR [walroller] wal.FSHLog(406): Failed close of WAL
writer
hdfs://localhost:49209/user/stack/test-data/c4949222-6bc9-417c-9d7c-b361315bfb1d/testStuckAfterAppendException/wal.1477957874194,
unflushedEntries=3
org.apache.hadoop.hbase.regionserver.wal.FailedSyncBeforeLogCloseException:
org.apache.hadoop.hbase.regionserver.wal.DamagedWALException: On sync
at
org.apache.hadoop.hbase.regionserver.wal.FSHLog$SafePointZigZagLatch.waitSafePoint(FSHLog.java:899)
at
org.apache.hadoop.hbase.regionserver.wal.FSHLog.doReplaceWriter(FSHLog.java:365)
at
org.apache.hadoop.hbase.regionserver.wal.FSHLog.doReplaceWriter(FSHLog.java:74)
at
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.replaceWriter(AbstractFSWAL.java:641)
at
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(AbstractFSWAL.java:708)
at
org.apache.hadoop.hbase.regionserver.wal.TestFSHLog$WALRoller$1.run(TestFSHLog.java:194)
Caused by: org.apache.hadoop.hbase.regionserver.wal.DamagedWALException: On sync
at
org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1101)
at
org.apache.hadoop.hbase.regionserver.wal.TestFSHLog$BatchManipulatingRingBufferEventHandler.onEvent(TestFSHLog.java:240)
at
org.apache.hadoop.hbase.regionserver.wal.TestFSHLog$BatchManipulatingRingBufferEventHandler.onEvent(TestFSHLog.java:225)
at
com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:128)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.SocketTimeoutException: Faked Append Exception!!!!
at
org.apache.hadoop.hbase.regionserver.wal.TestFSHLog$BatchManipulatingRingBufferEventHandler.append(TestFSHLog.java:247)
at
org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1051)
... 6 more
...
{code}
I'm throwing exception at wrong point. Will work more on this ONLY if wanted.
> RegionServer hang when aborting
> -------------------------------
>
> Key: HBASE-16960
> URL: https://issues.apache.org/jira/browse/HBASE-16960
> Project: HBase
> Issue Type: Bug
> Reporter: binlijin
> Assignee: binlijin
> Attachments: 16960.ut.missing.final.piece.txt, HBASE-16960.patch,
> HBASE-16960_master_v2.patch, HBASE-16960_master_v3.patch,
> RingBufferEventHandler.png, RingBufferEventHandler_exception.png,
> SyncFuture.png, SyncFuture_exception.png, rs1081.jstack
>
>
> We see regionserver hang when aborting several times and cause all regions on
> this regionserver out of service and then all affected applications stop
> works.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)