[
https://issues.apache.org/jira/browse/HADOOP-12421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14877136#comment-14877136
]
Steve Loughran commented on HADOOP-12421:
-----------------------------------------
worth fixing. FWIW I've encountered that in a large embedded system project
where all the SSD-based embedded devices booted at exactly the same time after
a facility-wide power cycle; overloaded TCP links to some servers, with them
all backing off at exactly the same rate. And even though they had Jitter, it
was driven off time-since-boot, so they were all in sync too.
moral: choose your randomness for the jitter well enough to handle simultaneous
cluster restarts
> Add jitter to RetryInvocationHandler
> ------------------------------------
>
> Key: HADOOP-12421
> URL: https://issues.apache.org/jira/browse/HADOOP-12421
> Project: Hadoop Common
> Issue Type: Bug
> Reporter: Elliott Clark
> Assignee: Elliott Clark
>
> Calls to NN can become synchronized across a cluster during NN failover. This
> leads to a spike in requests until things recover. Making an already tricky
> time worse.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)