Vanlightly commented on pull request #10534: URL: https://github.com/apache/pulsar/pull/10534#issuecomment-899386900
@equanz Let me know if I understand correctly. We publish aggregated stats, where we aggregate by partitioned producer, and optionally we also publish stats per partition with no aggregation. So the issue is that currently the stats collection code expects each partition to have the same number of producers on each partition, and uses that as a way of aggregating the stats for each partitioned producer. So we need another way of aggregating by partitioned producer. Moreover, it looks to me like the current strategy is already broken seeing as the numbers won't always match (neither numbers nor order) such as when producers come and go. Using producer name as an aggregation key is not good because if the producer name is not set by the user, then each internal producer will have a different producer name (generated by the broker). So the fix is to add another producer identity (producerStatsKey) that is shared across all producers of the same partitioned producer. Then aggregate on that. The producerStatsKey is set by the broker, to ensure it is globally unique. The partitioned producer creates a single producer when the partitioned producer is started. Then when additional internal producers for other partitions are required, they are created with the same producerStatsKey as the first producer. That way they all share the same key. Is that summary correct? Additional question: - Could we not use the producerName for aggregation and use the same strategy of starting a single producer, then making all others inherit their name from the first (when the user doesn't set the name)? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
