Dealing with network/broker outage on the producer side is also something that I've been trying to solve.
Having a hook for the producer to dump to a local file would probably be the simplest solution. In the event of a prolonged outage, this file could be replayed once availability is restored. The current approach I've been taking: 1) My bridge code between my data source and the Kafka producer writes everything to a local log files. When this bridge starts up, it generates a unique 8 character alphanumeric string. For each log entry it writes to the local file, it prefixes both the alphanumeric string and a log line number (0,1,2,3,....). The data already has timestamps coming with it. 2) In the event of a network outage or Kafka being unable to keep up with the producer, I simply drop the Kafka messages. I never allow my data source to be blocked because I'm waiting on Kafka producer/broker. 3) For given time ranges, my consumers track all the alphanumeric identifiers that they consumed and the maximum complete sequence number that they have seen. So I can manually go back to producers and replay any lost data. (Whether it was never sent because of network outage or if it died with a broker hardware failure). I basically go to the producer machine (which I track in the Kafka message body) and say: for time A to time B, I received data for these identifiers and max sequence numbers (najeh2wh, 12312), (ji3njdKL, 71). Replay anything that I'm missing. I use random identifier strings because it saves me from having to persist the number of log lines my producer has generated. (Robustness against producer failure). - Niek On Thu, Apr 12, 2012 at 7:12 AM, Edward Smith <esm...@stardotstar.org> wrote: > Jun/Eric, > > [snip] > > However, we have a requirement to support HA. If I stick with the > approach above, I have to worry about replication/mirroring the > queues, which always gets sticky. We have to handle the case where a > producer loses network connectivity, and so, must be able to queue > locally at the producer, which, I believe either means put the KAFKA > broker here or continue to use some 'homebrew' local queue. With > brokers on the same node as producers, consumers only have to HA the > results of their processing and I don't have to HA the queues. > > Any thoughts or feedback from the group is welcome. > > Ed >