shanthoosh opened a new pull request #987: SAMZA-2158: Remove the redunant coordinator stream reads in the ApplicationMaster startup sequence. URL: https://github.com/apache/samza/pull/987 **Changes:** Currently the input topic partitions assigned to a container of a samza job is stored in the coordinator stream(aka kafka topic). In samza-yarn ApplicationMaster startup sequence, the JobModel from the previous run of the samza job is read from the coordinator stream. JobModel is read multiple times(3 times) from the same kafka topic with different connections. These redundant reads prolongs the launch of containers by the samza-yarn ApplicationMaster. This fix is to remove the inefficieny by reading the coordinator stream only once with one connection. Please do note that the above two problems had slowed down ApplicationMaster startup and did not break functional correctness. **Note:** * In addition to the above problem, KafkaSystemAdmin is created multiple times for same topic/system multiple times in ApplicationMaster/Container and it will be fixed in SAMZA-2157. * Tested this patch with a test samza-yarn job inside linkedin and also with a beam job.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
