haphut commented on issue #6526:
URL: https://github.com/apache/pulsar/issues/6526#issuecomment-736804386


   Another variant of this problem occurs when we are using an in-order pub-sub 
API, e.g. an MQTT API, or any ephemeral event source to feed Pulsar.
   
   If we are running only one instance of the Pulsar Producer or Pulsar Source 
and the instance crashes or has network issues, some messages might never reach 
Pulsar. An obvious HA solution would have several identical instances of 
Producers running in parallel in different AZs, feeding one Pulsar topic with 
multiple copies of each message.
   
   How then do we retain only one copy of each unique message? Keep-last topic 
compaction could easily mess up the original order of the messages.
   
   Instead we can use a Pulsar Function or a Consumer-Producer to implement 
something similar to keep-first compaction. The Function needs to keep state 
somewhere for the set of keys or hashes of handled messages, maybe in 
Bookkeeper or another topic.
   
   This latter part of this HA feeder pattern would feel much more ergonomic 
with a built-in, keep-first, 
[on-the-fly](https://github.com/apache/pulsar/issues/6230) topic compaction.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to