Hi Malcolm, Using cleanup.policy=compact on the Kafka checkpoint topic should be sufficient, and is the default when the topic is created by Samza. Under normal operations, a checkpoint topic should only have ~ num task messages.
I can suggest the following ways to identify the issue: 1. Read the topic contents using kafka-console-consumer and check if the extra size is due to incorrect entries (a second / non-samza writer), or due to duplicate entries for the same key (log compaction issues). 2. If duplicate keys, verify if Kafka's log compaction is kicking in and compacting stale entries. One evidence of this working is a sawtooth pattern in the Kafka topic partition size graph. You can also check the Kafka broker logs for any log compaction related error messages. 3. If log compaction isn't working, verify if the related Kafka topic / broker configurations are appropriate. E.g, log.cleaner.enable, log.cleaner.threads, min.cleanable.dirty.ratio, min/max.compaction.lag.ms, delete.retention.ms etc. Let us know if you are able to find any more details. Thanks, Prateek On Tue, Nov 5, 2019 at 9:20 AM Malcolm McFarland <mmcfarl...@cavulus.com> wrote: > Hey folks, > > We have cleanup.policy=compact set on our checkpoint topics. Even with > this, we have almost 3 billion messages in some of these topics, and this > is causing huge startup times. Are there any other settings we should set > to optimize our startup times? > > Cheers, > Malcolm McFarland > Cavulus > > > This correspondence is from HealthPlanCRM, LLC, d/b/a Cavulus. Any > unauthorized or improper disclosure, copying, distribution, or use of the > contents of this message is prohibited. The information contained in this > message is intended only for the personal and confidential use of the > recipient(s) named above. If you have received this message in error, > please notify the sender immediately and delete the original message. >