[ 
https://issues.apache.org/jira/browse/SAMZA-388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini updated SAMZA-388:
----------------------------------

    Attachment: SAMZA-388-1.patch

Attaching updated patch with changes from RB.

> Log compaction on checkpoint topics fails with compression
> ----------------------------------------------------------
>
>                 Key: SAMZA-388
>                 URL: https://issues.apache.org/jira/browse/SAMZA-388
>             Project: Samza
>          Issue Type: Bug
>          Components: kafka
>    Affects Versions: 0.8.0
>            Reporter: Chris Riccomini
>            Assignee: Chris Riccomini
>         Attachments: SAMZA-388-0.patch, SAMZA-388-1.patch
>
>
> I have a job that has 10,000+ partitions that it's consuming from. After 
> SAMZA-123, it's been switched to use the GroupBySystemStreamPartition 
> strategy, which means it's got 10,000+ tasks, and thus, 10,000+ checkpoint 
> messages being sent every minute.
> To keep the checkpoint topic from getting too large, we enabled log 
> compaction on the Kafka topic, but we discovered that the topic then grew to 
> be very large. This behavior was triggered because we were sending compressed 
> messages to the Kafka checkpoint topic.
> Based on KAFKA-1374, it appears that we can't use compressed checkpoint 
> topics with log compaction.
> I'm mostly opening this ticket as a place holder for KAFKA-1374. Once the 
> ticket is resolved, we can update the Samza code to default the checkpoint 
> topics to be log compacted (with a small segment size), and not worry about 
> the compression anymore.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to