[
https://issues.apache.org/jira/browse/HBASE-16960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15632308#comment-15632308
]
Mikhail Antonov commented on HBASE-16960:
-----------------------------------------
Good job!
Skimmed the patch, looks good to me, but want to get back and dig more to it
this week to see if there are similar possible issues around it. appends
failing due to socket timeout on DN are to be expected, I'd say, but I don't
thing I've seen this... How bad is this for you? How frequently you see that
[~carp84] and [~aoxiang]?
"Actually binlijin and I also observed more questions on whether the current
implementation could assure the semantic that "failed appends won't get synced
successfully", and we're still digging into it. Will open another JIRA if any
solution."
Any follow-ups on that? It seems like there are few other changes to the WALs
either done, or in flight, but they seem too big to get to 1.3.0 and need to be
carefully stress tested. Thinking to move it to 1.3.1, where I'd bring those
changed and bake in. Thoughts (that depends on how bad this issue is) ?
> RegionServer hang when aborting
> -------------------------------
>
> Key: HBASE-16960
> URL: https://issues.apache.org/jira/browse/HBASE-16960
> Project: HBase
> Issue Type: Bug
> Reporter: binlijin
> Assignee: binlijin
> Attachments: 16960.ut.missing.final.piece.txt,
> HBASE-16960.branch-1.v1.patch, HBASE-16960.patch,
> HBASE-16960_master_v2.patch, HBASE-16960_master_v3.patch,
> HBASE-16960_master_v4.patch, RingBufferEventHandler.png,
> RingBufferEventHandler_exception.png, SyncFuture.png,
> SyncFuture_exception.png, rs1081.jstack
>
>
> We see regionserver hang when aborting several times and cause all regions on
> this regionserver out of service and then all affected applications stop
> works.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)