EugenDueck commented on issue #6209: [broker] add feature 
BrokerDeduplicationAcrossProducers
URL: https://github.com/apache/pulsar/pull/6209#issuecomment-583684456
 
 
   @codelipenghui I think you are raising two points, which I'll address below. 
I'm pretty new to pulsar and its concepts, so maybe I'm not yet understanding 
the implications of all decisions. Let me just explain in detail how I 
currently see it, and I'll be happy for you or anyone to point out whatever it 
is that I'm missing.
   
   1) how easy it is for users to understand
   
   This is a valid point. Personally, I do not feel that the concept of a 
groupMode implies that you can only have one group per groupMode. However, 
there is potential for confusion around "producerName" meaning "name of the 
producer group". That is the disadvantage of repurposing the producerName as 
the producer group name. On the other hand, there is an advantage in terms of 
simplicity of implementation, which I think is one of the reasons sijie 
preferred to use the identity of the producerName, if I understand him 
correctly.
   
   Adding a producer group name of course adds complexity in terms of 
implementation, but I think has also potential for confusion by users, because 
the possible/impossible combinations of producerName and producerGroupName 
would need to be documented (I guess it would be a hierarchy prodGroup/prod).
   
   2) having multiple producer groups, each with their own deduplication 
settings
   
   This is the way I currently see it: If we turn on deduplication for a topic, 
we have the same flexibility that we would get without producer group names:
   
   1. If you want to deduplicate across a group of producers, give them the 
same producerName and set the groupMode to 'parallel'
   1. for producers that don't want deduplication, use groupMode 'exclusive'
   1. for failover producers use groupMode 'failover' ('failover' mode yet to 
be implemented)
   
   And above groups can be freely mixed and matched within the same topic, 
which means you can have x parallel groups, y exclusive "groups", and z 
failover groups, all at the same time.
   
   The only use case I can come up with that the current implementation cannot 
handle is this: One failover group that wants deduplication, while another 
group (failover or parallel) does not want deduplication on the same topic - or 
vice-a-versa.
   
   But this should be orthogonal to the question of producerGroupName yes or 
no: It is a question of the granularity of the deduplication setting (currently 
at the namespace and broker level), which we could make more fine-grained up to 
the level of producer groups (although I wonder how many users would need 
that), regardless of whether we have a producerGroupName or if we use the 
producerName.
   
   If there are any other use cases that I'm missing, I'd like to know.
   
   So unless I'm missing a use case that is impossible with the current 
implementation and a finer graining of the dedupe setting in the future, I 
would prefer properly documenting the new groupMode feature (similar to the way 
subscription types aka subscription modes are documented), to adding complexity 
in terms of implementation, which I believe also needs documentation to avoid 
confusion.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to