[
https://issues.apache.org/jira/browse/SAMZA-388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chris Riccomini updated SAMZA-388:
----------------------------------
Attachment: SAMZA-388-1.patch
Attaching updated patch with changes from RB.
> Log compaction on checkpoint topics fails with compression
> ----------------------------------------------------------
>
> Key: SAMZA-388
> URL: https://issues.apache.org/jira/browse/SAMZA-388
> Project: Samza
> Issue Type: Bug
> Components: kafka
> Affects Versions: 0.8.0
> Reporter: Chris Riccomini
> Assignee: Chris Riccomini
> Attachments: SAMZA-388-0.patch, SAMZA-388-1.patch
>
>
> I have a job that has 10,000+ partitions that it's consuming from. After
> SAMZA-123, it's been switched to use the GroupBySystemStreamPartition
> strategy, which means it's got 10,000+ tasks, and thus, 10,000+ checkpoint
> messages being sent every minute.
> To keep the checkpoint topic from getting too large, we enabled log
> compaction on the Kafka topic, but we discovered that the topic then grew to
> be very large. This behavior was triggered because we were sending compressed
> messages to the Kafka checkpoint topic.
> Based on KAFKA-1374, it appears that we can't use compressed checkpoint
> topics with log compaction.
> I'm mostly opening this ticket as a place holder for KAFKA-1374. Once the
> ticket is resolved, we can update the Samza code to default the checkpoint
> topics to be log compacted (with a small segment size), and not worry about
> the compression anymore.
--
This message was sent by Atlassian JIRA
(v6.2#6252)