Thanks Tom for sharing your idea. I see the below condition as necessary for the solution to be full proof - These two things need to be done atomically by server. After writing the data into data partition, 1. Updating offset of data partition, and 2. Writing the updated offset to control partition.
This "may" not be possible to do atomically. (since we need to write to two different files.) I have glimpsed over server code but dont have expertise. Can someone who is more versed with the server code ack/nack ? Also, can someone validate this: 1. Write to partition as usual. The data should contain a uid based on timestamp/counter. 2. On producer re-connects/timeouts or server crashes, the producer should ask for last committed message, and extract the uid to decide if last message went through. Or may be get this from Zookeeper. Thanks, Rohit