[
https://issues.apache.org/jira/browse/STORM-3602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ethan Li resolved STORM-3602.
-----------------------------
Fix Version/s: 2.1.1
2.2.0
Resolution: Fixed
Thanks [~agresch]. I merged this to master and 2.1.x-branch
> loadaware shuffle can overload local worker
> -------------------------------------------
>
> Key: STORM-3602
> URL: https://issues.apache.org/jira/browse/STORM-3602
> Project: Apache Storm
> Issue Type: Bug
> Reporter: Aaron Gresch
> Assignee: Aaron Gresch
> Priority: Major
> Labels: pull-request-available
> Fix For: 2.2.0, 2.1.1
>
> Time Spent: 1h
> Remaining Estimate: 0h
>
> We were seeing a worker overloaded and tuples timing out with loadaware
> shuffle enabled. From investigating, we found that the code allows switching
> from Host local to Worker local if the load average is lower than the low
> water mark. It really should be checking the load on the worker instead.
>
> What's happening is the worker is overloaded with tons of idle host local
> tasks, so it switches to HOST_LOCAL. Then the calculation across all the
> host tasks is below the low water mark and it immediately switches back to
> the overloaded worker local task.
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)