haphut commented on issue #6526: URL: https://github.com/apache/pulsar/issues/6526#issuecomment-736804386
Another variant of this problem occurs when we are using an in-order pub-sub API, e.g. an MQTT API, or any ephemeral event source to feed Pulsar. If we are running only one instance of the Pulsar Producer or Pulsar Source and the instance crashes or has network issues, some messages might never reach Pulsar. An obvious HA solution would have several identical instances of Producers running in parallel in different AZs, feeding one Pulsar topic with multiple copies of each message. How then do we retain only one copy of each unique message? Keep-last topic compaction could easily mess up the original order of the messages. Instead we can use a Pulsar Function or a Consumer-Producer to implement something similar to keep-first compaction. The Function needs to keep state somewhere for the set of keys or hashes of handled messages, maybe in Bookkeeper or another topic. This latter part of this HA feeder pattern would feel much more ergonomic with a built-in, keep-first, [on-the-fly](https://github.com/apache/pulsar/issues/6230) topic compaction. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
