[ 
https://issues.apache.org/jira/browse/HBASE-16960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15624799#comment-15624799
 ] 

Yu Li commented on HBASE-16960:
-------------------------------

Wow, clever method to reproduce the issue [~aoxiang]!

Skimmed the patch, overall LGTM, some minor comments:
1. Add some comments about the steps of the case, something like:
{code}
  /**
   * Reproduce locking up that happens when there's no further syncs after 
append fails, and causing
   * an isolated sync then infinite wait. See HBASE-16960. If below is broken, 
we will see this test
   * timeout because it is locked up.
   * <p/>
   * Steps for reproduce:<br/>
   * 1. Trigger server abort through dodgyWAL1<br/>
   * 2. Add a {@link DummyWALActionsListener} to dodgyWAL2 to cause ringbuffer 
event handler thread
   * sleep for a while thus keeping {@code endOfBatch} false<br/>
   * 3. Publish a sync then an append which will throw exception, check whether 
the sync could
   * return
   */
  @Test(timeout = 20000)
  public void testLockup16960() throws IOException {
{code}

2. Add some comments around {{DummyWALActionsListener}} for better 
understanding, like
{code}
    // Add a listener to force ringbuffer event handler sleep for a while
    dodgyWAL2.registerWALActionsListener(new DummyWALActionsListener());
{code}

Good job!

> RegionServer hang when aborting
> -------------------------------
>
>                 Key: HBASE-16960
>                 URL: https://issues.apache.org/jira/browse/HBASE-16960
>             Project: HBase
>          Issue Type: Bug
>            Reporter: binlijin
>            Assignee: binlijin
>         Attachments: 16960.ut.missing.final.piece.txt, HBASE-16960.patch, 
> HBASE-16960_master_v2.patch, HBASE-16960_master_v3.patch, 
> RingBufferEventHandler.png, RingBufferEventHandler_exception.png, 
> SyncFuture.png, SyncFuture_exception.png, rs1081.jstack
>
>
> We see regionserver hang when aborting several times and cause all regions on 
> this regionserver out of service and then all affected applications stop 
> works.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to