EugenDueck commented on issue #6209: [broker] add feature BrokerDeduplicationAcrossProducers URL: https://github.com/apache/pulsar/pull/6209#issuecomment-583684456 @codelipenghui I think you are raising two points, which I'll address below. I'm pretty new to pulsar and its concepts, so maybe I'm not yet understanding the implications of all decisions. Let me just explain in detail how I currently see it, and I'll be happy for you or anyone to point out whatever it is that I'm missing. 1) how easy it is for users to understand This is a valid point. Personally, I do not feel that the concept of a groupMode implies that you can only have one group per groupMode. However, there is potential for confusion around "producerName" meaning "name of the producer group". That is the disadvantage of repurposing the producerName as the producer group name. On the other hand, there is an advantage in terms of simplicity of implementation, which I think is one of the reasons sijie preferred to use the identity of the producerName, if I understand him correctly. Adding a producer group name of course adds complexity in terms of implementation, but I think has also potential for confusion by users, because the possible/impossible combinations of producerName and producerGroupName would need to be documented (I guess it would be a hierarchy prodGroup/prod). 2) having multiple producer groups, each with their own deduplication settings This is the way I currently see it: If we turn on deduplication for a topic, we have the same flexibility that we would get without producer group names: 1. If you want to deduplicate across a group of producers, give them the same producerName and set the groupMode to 'parallel' 1. for producers that don't want deduplication, use groupMode 'exclusive' 1. for failover producers use groupMode 'failover' ('failover' mode yet to be implemented) And above groups can be freely mixed and matched within the same topic, which means you can have x parallel groups, y exclusive "groups", and z failover groups, all at the same time. The only use case I can come up with that the current implementation cannot handle is this: One failover group that wants deduplication, while another group (failover or parallel) does not want deduplication on the same topic - or vice-a-versa. But this should be orthogonal to the question of producerGroupName yes or no: It is a question of the granularity of the deduplication setting (currently at the namespace and broker level), which we could make more fine-grained up to the level of producer groups (although I wonder how many users would need that), regardless of whether we have a producerGroupName or if we use the producerName. If there are any other use cases that I'm missing, I'd like to know. So unless I'm missing a use case that is impossible with the current implementation and a finer graining of the dedupe setting in the future, I would prefer properly documenting the new groupMode feature (similar to the way subscription types aka subscription modes are documented), to adding complexity in terms of implementation, which I believe also needs documentation to avoid confusion.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
