jia-gao opened a new pull request, #1631:
URL: https://github.com/apache/samza/pull/1631

   Issue:
   Currently, diagnostics topics for samza jobs are created with a single 
partition. 
   for some jobs the size of the this partition can grow to be large. thus 
there could be a need to increase partition count of the topic.
   however, DiagnosticsManager within samza-li framework uses hostname of the 
container as the partition key while emiting the msg to kafka. Due to hostnames 
at LinkedIn being very similar, the hash of these in partition key will not 
evenly distribute the msgs across partitions.
   
   Change: thus there needs to be work done to emit in round robin fashion. 
this can be achieved by keeping partition key as null. 
   
   Test Done:
   ./gradlew build


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to