More evidence: Spark is also affected: https://issues.apache.org/jira/browse/SPARK-2687 One more relevant yarn jira: https://issues.apache.org/jira/browse/YARN-1902
-- Shrijeet On Mon, Mar 23, 2015 at 10:15 AM, Shrijeet Paliwal < [email protected]> wrote: > Hello, > > *Context:* > > We were seeing very aggressive preemption done by Fair Scheduler and 98% > of preemption activity is triggered due to slider queue's needs. Slider > queue is stable queue i.e its containers don't churn and it has been > provided a fair share guarantee of more than it needs (high weight & min > share double of its steady state needs). So it was puzzling to see it > triggering preemption. When I turned on debug logging of fair scheduler I > noticed scheduler demand update thread reporting unusually high demand from > Slider queue. > > Initial thought was a bug in scheduler but later I concluded its Slider's > problem but not due to its own code but due to AMRMClient code. I can > deterministically reproduce the issue on my laptop running a pseudo > yarn+slider setup. I traced it to an open issue > https://issues.apache.org/jira/browse/YARN-3020. > > *The problem: * > > 1. A region server fails for the first time, slider notices it > and registers a request to RM via AMRMClient for a new container. At this > time AMRMClient caches this allocation request with the 'Resource' (a data > structure with memory, cpu & priority) as key. > (source: AMRMClientImpl.java, cache is remoteRequestsTable) > 2. A region server fails again, slider notices it and registers a request > to RM again via AMRMClient for a (one) new container. AMRMClient finds that > similar Resource request (the memory, cpu and priority for RS doesn't > change obviously) in its cache, add +1 to the container count before > putting it over wire.*NOTE*: Slider didn't need 2 containers, but ends up > receiving 2. When containers are allocated, slider keeps one and discards > one. > 3. As explained in YARN-3020, with subsequent failures we will keep asking > for more and more containers when in reality we always need one. > > For fair scheduler this means demand keeps going up. It doesn't know that > slider ends up discarding the surplus containers. In order to satisfy the > demand it kills mercilessly. Needless to say this will not be just > triggered by container failure, even flexing should trigger this. > > *The fix: * > > Rumor is that AMRMClient doesn't have a bug, its intended behaviour > (source: comments in YARN-3020). The claim is that on receiving > container client should clear the cache by calling a method called > 'removeContainerRequest'. Slider isn't following the protocol correctly, in > Slider's defense the protocol is not well defined. > > Thoughts? > -- > Shrijeet >
