Youjie Chen created SLIDER-939:
----------------------------------
Summary: flex down does not cancel the outstanding request
Key: SLIDER-939
URL: https://issues.apache.org/jira/browse/SLIDER-939
Project: Slider
Issue Type: Bug
Components: core
Affects Versions: Slider 0.80
Environment: Hadoop 2.7.1
Slider 0.80.0
Reporter: Youjie Chen
Fix For: Slider 0.81
I run slider app on a 6 nodes cluster. To ensure there is only one
comonent(worker) instance on each node, I set yarn.memory to 51% of the total
memory.
Then I flex up to 7 workers, there would be one worker request(outstanding)
that will never be met, this is expected.
Then I flexed down back to 6 workers, and any container request for any job
would be blocked even if there are plenty of memory/core for the job, From RM
log, we can see there are continuous output:
capacity.CapacityScheduler
(CapacityScheduler.java:allocateContainersToNode(1240)) - Skipping scheduling
since node test.example.com:45454 is reserved by application
appattempt_1442384698868_0008_000001
It seems the outstanding requests are not actually cancelled in the
requesting container queue but keep trying to request.
After I flexed down to 5 workers, the other blocked jobs can run.
This is related to JIRA https://issues.apache.org/jira/browse/SLIDER-490
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)