[ 
https://issues.apache.org/jira/browse/HBASE-16960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15621313#comment-15621313
 ] 

ramkrishna.s.vasudevan commented on HBASE-16960:
------------------------------------------------

bq.ut the problem in this JIRA is some case that there's no further syncs after 
append fails, and causing an isolated sync then infinite wait. The proposal 
will try to clean previous non-synced syncFutures so it won't leave any 
isolated one, and don't break any existing logic.
This is true. Infact am also looking out this possibility only for the AsyncWAL 
case. 
bq.. It is a weakness of the implementation that every append must be followed 
by a sync else the machinery gets stuck.
This is what I am getting when I tried to use ring buffer with AsyncWAL. But 
reading this FSHLOg code I found things are much better because every time the 
head of the queue was removed and we were setting the highestSyncID with that 
current syncid. 
So any other sync in the syncFuture were checked and if their txid is greater 
than this we were skipping it from marking done. But the failure case am not 
very sure. But this append followed by sync mechanism is causing such bugs.

> RegionServer hang when aborting
> -------------------------------
>
>                 Key: HBASE-16960
>                 URL: https://issues.apache.org/jira/browse/HBASE-16960
>             Project: HBase
>          Issue Type: Bug
>            Reporter: binlijin
>            Assignee: binlijin
>         Attachments: HBASE-16960.patch, HBASE-16960_master_v2.patch, 
> RingBufferEventHandler.png, RingBufferEventHandler_exception.png, 
> SyncFuture.png, SyncFuture_exception.png, rs1081.jstack
>
>
> We see regionserver hang when aborting several times and cause all regions on 
> this regionserver out of service and then all affected applications stop 
> works.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to