jia-gao opened a new pull request, #1631: URL: https://github.com/apache/samza/pull/1631
Issue: Currently, diagnostics topics for samza jobs are created with a single partition. for some jobs the size of the this partition can grow to be large. thus there could be a need to increase partition count of the topic. however, DiagnosticsManager within samza-li framework uses hostname of the container as the partition key while emiting the msg to kafka. Due to hostnames at LinkedIn being very similar, the hash of these in partition key will not evenly distribute the msgs across partitions. Change: thus there needs to be work done to emit in round robin fashion. this can be achieved by keeping partition key as null. Test Done: ./gradlew build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
