[ 
https://issues.apache.org/jira/browse/IMPALA-9716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17103906#comment-17103906
 ] 

ASF subversion and git services commented on IMPALA-9716:
---------------------------------------------------------

Commit e8d17948fb94e9a5cf8b6cfce9e05d51ebb65492 in impala's branch 
refs/heads/master from Thomas Tauber-Marshall
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=e8d1794 ]

IMPALA-9716: Add jitter to the exponential backoff in status reporting

When status reports fail, we use exponential backoff when retrying
sending them. However, currently the backoff is deterministic, leading
to a thundering herd problem where all of the backends for a
particular query may try to report at the same time, the coordinator
is overwhelmed and rejects some of the rpcs, then the backends all
backoff by the same amount and retry sending at the same time, leading
the coordinator to be overwhelmed again.

This patch alleviates this problem by adding some random jitter to the
exponential backoff used when a status report fails.

Testing:
- Passed a full run of existing tests.
- Code path is covered by test_reportexecstatus_retries

Change-Id: Id05c224517aa606057117328f480dfa98676b923
Reviewed-on: http://gerrit.cloudera.org:8080/15860
Reviewed-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>


> Add jitter to the exponential backoff in status reporting
> ---------------------------------------------------------
>
>                 Key: IMPALA-9716
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9716
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Distributed Exec
>    Affects Versions: Impala 4.0
>            Reporter: Thomas Tauber-Marshall
>            Assignee: Thomas Tauber-Marshall
>            Priority: Major
>
> When status reports fail, we use exponential backoff when retrying sending 
> them. However, currently the backoff is deterministic, leading to a 
> thundering herd problem where all of the backends for a particular query may 
> try to report at the same time, the coordinator is overwhelmed and rejects 
> some of the rpcs, then the backends all backoff by the same amount and retry 
> sending at the same time, leading the coordinator to be overwhelmed again.
> We can help solve this by adding some random jitter to the exponential 
> backoff time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to