----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/44820/ -----------------------------------------------------------
(Updated March 15, 2016, 3:19 a.m.) Review request for samza, Boris Shkolnik, Jake Maes, Yi Pan (Data Infrastructure), Navina Ramesh, and Xinyu Liu. Repository: samza Description (updated) ------- The ContainerRequestState class is currently not thread-safe. The class's methods and state variables are called from both the ContainerAllocator and the AMRMCallback handler. Here's an analysis I summarized by looking at the current implementation. From the below table, getContainersOnAHost() returns the entire list of containers to the AbstractContainerAllocator which reads it. However, the Callback thread invokes methods that write to the allocatedContainerQueue. (causing a potential race condition) +-----------------------------+--------------+-----------------+-------------------------+---------------------+ | Method | requestQueue | hostsToCountMap | allocatedContainerQueue | ThreadsAccesedFrom | +-----------------------------+--------------+-----------------+-------------------------+---------------------+ | updateRequestState | write | write/read | write | Callback, Allocator | +-----------------------------+--------------+-----------------+-------------------------+---------------------+ | addContainer | | read | write | Callback | +-----------------------------+--------------+-----------------+-------------------------+---------------------+ | updateStateAfterAssignment | write | write | read | Allocator | +-----------------------------+--------------+-----------------+-------------------------+---------------------+ | releaseExtraContainers | write | write | write | Allocator | +-----------------------------+--------------+-----------------+-------------------------+---------------------+ | releaseUnstartableContainer | | | | Allocator | +-----------------------------+--------------+-----------------+-------------------------+---------------------+ | getContainersOnAHost | | | read | Allocator | +-----------------------------+--------------+-----------------+-------------------------+---------------------+ | getRequestsQueue | read | | | Allocator | +-----------------------------+--------------+-----------------+-------------------------+---------------------+ I'm proposing to make the ContainerRequestState class thread-safe by: i) Expose more granular synchronized APIs for things like: 1. Getting num containers on a host. 2. Getting the number of pending requests. 3. Checking if the queue is empty etc. / peeking a queue etc. ii) Getting rid of the non-synchronized methods. iii) Ensuring that methods that return a map/a list, don't return the original map/list, but a defensive copy instead. The other methods are already synchronized. This improvement is beneficial because: 1. This allows us to reason about concurrency more effectively when using the request-state in multi-threaded contexts. 2. It's cleaner for the ContainerRequestState class to expose higher level methods and not expose the guts of the class in its entirety. --- Diff 2--- Modified the getContainersOnHost method to make a defensive copy. Diffs (updated) ----- samza-yarn/src/main/java/org/apache/samza/job/yarn/AbstractContainerAllocator.java b4789e62beb1120f11a8101664b10c34ae930e58 samza-yarn/src/main/java/org/apache/samza/job/yarn/ContainerRequestState.java 3e3f48ce2b5c0802e8c7c4f09b632df2d2265c12 samza-yarn/src/test/java/org/apache/samza/job/yarn/TestContainerAllocator.java 2b1bdab3c8de3184e930c244a8cae55813c33565 samza-yarn/src/test/java/org/apache/samza/job/yarn/TestContainerRequestState.java 402fe784120ac40ed542d9fa60d6a6d7df9c8cda samza-yarn/src/test/java/org/apache/samza/job/yarn/TestHostAwareContainerAllocator.java 0c7a09f3e4c4c2ce6788be729d0bf4a294243c68 samza-yarn/src/test/java/org/apache/samza/job/yarn/TestSamzaTaskManager.java 9da1edf6ff165ef0306de8730853ad30551a9831 Diff: https://reviews.apache.org/r/44820/diff/ Testing ------- All unit tests pass. Thanks, Jagadish Venkatraman