Hi All The use-case is pretty simple. Lets say we have a history of events with the following: key=userId, value = (timestamp, productId)
and we want to remap it to (just as we would with an internal topic): key=productId, value=(original_timestamp, userId) Now, say I have 30 days of backlog, and 2 partitions for the input topic. I spin up two instances and let them process the data from the start of time, but one instance is only half as powerful (less CPU, Mem, etc), such that instance 0 processes X events / sec which instance 1 processes x/2 events /sec. My question is: Are there *any* clients, kafka streams, spark, flink, etc or otherwise, that would allow these two consumers to remain in sync *according to their timestamps*? I don't want to see events with original_timestamp of today (from instance 0) interleaved with events from 15 days ago (from the underpowered instance 1). Yes, I do realize this would bring my throughput down, but I am looking for any existing tech that would effectively say *"cap the time difference of events coming out of this repartition processor at 60 seconds max"* Currently, I am not aware of ANY open source solutions that do this for Kafka, but if someone has heard otherwise I would love to know. Alternately, perhaps this could lead to a KIP. Thanks Adam