Chris Riccomini created SAMZA-388:
-------------------------------------

             Summary: Log compaction on checkpoint topics fails with compression
                 Key: SAMZA-388
                 URL: https://issues.apache.org/jira/browse/SAMZA-388
             Project: Samza
          Issue Type: Bug
          Components: kafka
    Affects Versions: 0.8.0
            Reporter: Chris Riccomini


I have a job that has 10,000+ partitions that it's consuming from. After 
SAMZA-123, it's been switched to use the GroupBySystemStreamPartition strategy, 
which means it's got 10,000+ tasks, and thus, 10,000+ checkpoint messages being 
sent every minute.

To keep the checkpoint topic from getting too large, we enabled log compaction 
on the Kafka topic, but we discovered that the topic then grew to be very 
large. This behavior was triggered because we were sending compressed messages 
to the Kafka checkpoint topic.

Based on KAFKA-1374, it appears that we can't use compressed checkpoint topics 
with log compaction.

I'm mostly opening this ticket as a place holder for KAFKA-1374. Once the 
ticket is resolved, we can update the Samza code to default the checkpoint 
topics to be log compacted (with a small segment size), and not worry about the 
compression anymore.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to