Some context: Each k-v store has a changelog topic. The # of partitions in that changelog topic is equal to the # of tasks. Each task's K-V store will be mapped to a particular partition of that changelog topic. This mapping from taskNames-changeLogPartitionNumber is stored in coordinator stream.
Of course, you don't want this k-v changelog topic to keep growing. So, people configure it with some expiration. The expiration can either be: 1. Time retention: Records older than the retention are purged. 2. Compaction: Newer key-values will over-write older keys and only the most recent value is retained. I'm not sure if offsets are always monotonically increasing in Kafka or could change after a compaction/ a time based retention kicks in for the topic partition. On Sat, Jun 11, 2016 at 11:53 PM, David Yu <david...@optimizely.com> wrote: > My understanding of store changelog is that, each task writes store changes > to a particular changelog partition for that task. (Does that mean the > changelog keys are task names?) > > One thing that confuses me is that, the last offsets of some changelog > partitions do not move. I'm using the kafka GetOffsetShell tool to get the > last offsets for each partition. The result looks like this: > > partition offset > 0 7090 > 1 3737937 > 2 3733222 > 3 3719065 > 4 3730208 > 5 3731128 > 6 3734669 > 7 3691461 > 8 3759133 > 9 7286 > 10 3690347 > 11 3722450 > 12 7376 > 13 3738454 > 14 3742316 > 15 3710512 > 16 3777267 > 17 3750596 > 18 3728185 > 19 3694470 > > As you can see, three of the partitions barely got any updates. In fact, > the offsets stopped moving for a while. The traffic for each task should be > fairly balanced. I checked the task log and made sure that the stores for > these partitions are actively updated. > > Any idea why this is happening? Or am I missing something? > > Thanks, > David > -- Jagadish V, Graduate Student, Department of Computer Science, Stanford University