Chris Riccomini created SAMZA-388:
-------------------------------------
Summary: Log compaction on checkpoint topics fails with compression
Key: SAMZA-388
URL: https://issues.apache.org/jira/browse/SAMZA-388
Project: Samza
Issue Type: Bug
Components: kafka
Affects Versions: 0.8.0
Reporter: Chris Riccomini
I have a job that has 10,000+ partitions that it's consuming from. After
SAMZA-123, it's been switched to use the GroupBySystemStreamPartition strategy,
which means it's got 10,000+ tasks, and thus, 10,000+ checkpoint messages being
sent every minute.
To keep the checkpoint topic from getting too large, we enabled log compaction
on the Kafka topic, but we discovered that the topic then grew to be very
large. This behavior was triggered because we were sending compressed messages
to the Kafka checkpoint topic.
Based on KAFKA-1374, it appears that we can't use compressed checkpoint topics
with log compaction.
I'm mostly opening this ticket as a place holder for KAFKA-1374. Once the
ticket is resolved, we can update the Samza code to default the checkpoint
topics to be log compacted (with a small segment size), and not worry about the
compression anymore.
--
This message was sent by Atlassian JIRA
(v6.2#6252)