[
https://issues.apache.org/jira/browse/HBASE-16960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15624799#comment-15624799
]
Yu Li commented on HBASE-16960:
-------------------------------
Wow, clever method to reproduce the issue [~aoxiang]!
Skimmed the patch, overall LGTM, some minor comments:
1. Add some comments about the steps of the case, something like:
{code}
/**
* Reproduce locking up that happens when there's no further syncs after
append fails, and causing
* an isolated sync then infinite wait. See HBASE-16960. If below is broken,
we will see this test
* timeout because it is locked up.
* <p/>
* Steps for reproduce:<br/>
* 1. Trigger server abort through dodgyWAL1<br/>
* 2. Add a {@link DummyWALActionsListener} to dodgyWAL2 to cause ringbuffer
event handler thread
* sleep for a while thus keeping {@code endOfBatch} false<br/>
* 3. Publish a sync then an append which will throw exception, check whether
the sync could
* return
*/
@Test(timeout = 20000)
public void testLockup16960() throws IOException {
{code}
2. Add some comments around {{DummyWALActionsListener}} for better
understanding, like
{code}
// Add a listener to force ringbuffer event handler sleep for a while
dodgyWAL2.registerWALActionsListener(new DummyWALActionsListener());
{code}
Good job!
> RegionServer hang when aborting
> -------------------------------
>
> Key: HBASE-16960
> URL: https://issues.apache.org/jira/browse/HBASE-16960
> Project: HBase
> Issue Type: Bug
> Reporter: binlijin
> Assignee: binlijin
> Attachments: 16960.ut.missing.final.piece.txt, HBASE-16960.patch,
> HBASE-16960_master_v2.patch, HBASE-16960_master_v3.patch,
> RingBufferEventHandler.png, RingBufferEventHandler_exception.png,
> SyncFuture.png, SyncFuture_exception.png, rs1081.jstack
>
>
> We see regionserver hang when aborting several times and cause all regions on
> this regionserver out of service and then all affected applications stop
> works.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)