mynameborat opened a new pull request #1436:
URL: https://github.com/apache/samza/pull/1436


   **Problem**: Currently, we have a deterministic way of creating the operator 
DAG by having a `LinkedHashMap` so that during our runtime, we ensure the 
lifecycle of operators follow a deterministic order.
   
   While we use the same order to traverse the graph and create the DAG, we 
lose the order within the sub DAG as the registered operators is a `HashSet`. 
The implication is result of an operator is dispatched non-deterministically to 
its sub-DAG. i.e
   
   ```
   Op A --> Op B --> Op C
      | --> Op D --> Op E
   ```
   
   Output of Op A can be dispatched to Op B or Op D depending how we iterate 
the `registeredOperators` set of Op A. 
   
   While this is not a guarantee Samza provides to applications, we want to be 
consistent with graph traversal order, DAG insertion order and DAG traversal 
order.
   
   **Change**: Use `LinkedHashSet` instead of `HashSet` to make it consistent.
   **Tests**: Added unit test to ensure insertion order is maintained during 
traversal
   **API Changes**: None
   **Usage Instructions**: None
   **Upgrade Instructions**: None


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to