[
https://issues.apache.org/jira/browse/HBASE-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12854673#action_12854673
]
ryan rawson commented on HBASE-2421:
------------------------------------
the solution is _not_ to switch to using "getRegionLocationForRowWithRetries"
since there is no row involved in this call - it is a series of rows that span
multiple regions. Probably getRSWR needs to detect and fail faster if a
connection is refused.
> Put hangs for 10 retries on failed region servers
> -------------------------------------------------
>
> Key: HBASE-2421
> URL: https://issues.apache.org/jira/browse/HBASE-2421
> Project: Hadoop HBase
> Issue Type: Bug
> Reporter: Jean-Daniel Cryans
> Assignee: ryan rawson
> Priority: Critical
> Fix For: 0.20.5, 0.21.0
>
>
> Since MultiPut got in, instead of calling getRegionLocationForRowWithRetries
> we now call getRegionServerWithRetries to send an array list of Puts. The
> problem is that if the region server failed, we'll still retry the 10 times
> in a backoff fashion even tho we get connections refused. This is also true
> for a single put since it's the same code path.
> Marking as critical since it almost disables our responsiveness to machine
> failures in certain cases where we are already sending a batch of edits when
> the server fails. Assigning to Ryan since he's been there recently.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.