Yi Pan (Data Infrastructure) created SAMZA-1893:
---------------------------------------------------

             Summary: JobNodeConfigurationGenerator should only generate 
configurations for operators, streams, stores, and tables that are reachable by 
a JobNode
                 Key: SAMZA-1893
                 URL: https://issues.apache.org/jira/browse/SAMZA-1893
             Project: Samza
          Issue Type: Improvement
            Reporter: Yi Pan (Data Infrastructure)


Currently, the planner does not generate multi-job ExecutionPlan yet. And 
hence, the current implementation of JobNodeConfigurationGenerator does not 
strictly follow the rule to only generate the configuration for 
input/output/intermediate streams, operators, stores, and tables reachable by 
the current JobNode yet (i.e. for single JobNode plan, everything is reachable 
by the only JobNode).

When we extend it to multi-job plan, we need to generate the configurations 
only for streams, operators, stores, and tables that are reachable by a 
JobNode. If two JobNodes collide on the configuration for those configuration, 
it will result in the following problems:
1) input streams are consumed multiple times, unnecessarily
2) stores' changelog will be written by multiple jobs, creating consistency 
issue in recovery
3) tables will be accessed (read or write) by multiple jobs, also creating 
consistency issues

We need to make sure that JobNodes in multi-job plans don't create collision in 
configuration for input/output/intermediate streams, state stores, and tables.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to