[
https://issues.apache.org/jira/browse/HBASE-16960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15621313#comment-15621313
]
ramkrishna.s.vasudevan commented on HBASE-16960:
------------------------------------------------
bq.ut the problem in this JIRA is some case that there's no further syncs after
append fails, and causing an isolated sync then infinite wait. The proposal
will try to clean previous non-synced syncFutures so it won't leave any
isolated one, and don't break any existing logic.
This is true. Infact am also looking out this possibility only for the AsyncWAL
case.
bq.. It is a weakness of the implementation that every append must be followed
by a sync else the machinery gets stuck.
This is what I am getting when I tried to use ring buffer with AsyncWAL. But
reading this FSHLOg code I found things are much better because every time the
head of the queue was removed and we were setting the highestSyncID with that
current syncid.
So any other sync in the syncFuture were checked and if their txid is greater
than this we were skipping it from marking done. But the failure case am not
very sure. But this append followed by sync mechanism is causing such bugs.
> RegionServer hang when aborting
> -------------------------------
>
> Key: HBASE-16960
> URL: https://issues.apache.org/jira/browse/HBASE-16960
> Project: HBase
> Issue Type: Bug
> Reporter: binlijin
> Assignee: binlijin
> Attachments: HBASE-16960.patch, HBASE-16960_master_v2.patch,
> RingBufferEventHandler.png, RingBufferEventHandler_exception.png,
> SyncFuture.png, SyncFuture_exception.png, rs1081.jstack
>
>
> We see regionserver hang when aborting several times and cause all regions on
> this regionserver out of service and then all affected applications stop
> works.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)