[ 
https://issues.apache.org/jira/browse/SAMZA-226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221094#comment-14221094
 ] 

Chris Riccomini commented on SAMZA-226:
---------------------------------------

Also, this patch will break hello-samza (latest). Please open a JIRA and fix 
that as well.

> Auto-create changelog streams for kv
> ------------------------------------
>
>                 Key: SAMZA-226
>                 URL: https://issues.apache.org/jira/browse/SAMZA-226
>             Project: Samza
>          Issue Type: Bug
>          Components: container, kv
>    Affects Versions: 0.8.0
>            Reporter: Chris Riccomini
>            Assignee: Naveen
>             Fix For: 0.8.0
>
>         Attachments: rb28016 (1).patch, rb28016 (2).patch, rb28016.patch
>
>
> Currently, changelog topics are not auto-created. This is a frustrating user 
> experience, and there are a few useful defaults that should be set that are 
> not obvious when creating Kafka topics with log compaction enabled.
> We should have Samza auto-create changelog streams for the kv stores that 
> have changelogs enabled.
> In Kafka's case, the changelog topics should be created with compaction 
> enabled. They should also be created with a smaller (100mb) default 
> [segment.bytes|http://kafka.apache.org/documentation.html#configuration] 
> setting. The smaller segment.bytes setting is useful for low-volume 
> changelogs. The problem we've seen in the past is that the default 
> log.segment.bytes is 1 gig. Kafka's compaction implementation NEVER touches 
> the most recent log segment. This means that, if you have a very small state 
> store, but execute a lot of deletes/updates (e.g. you've only got maybe 25 
> megs of active state, but are deleting and updating it frequently), you will 
> always end up with at LEAST 1 gig of state to restore (since the most recent 
> segment will always contain non-compacted writes). This is silly since your 
> active (compacted) state is really only ~25 megs. Shrinking the segment bytes 
> means that you'll have a smaller maximum data size to restore. The trade off 
> here is that we'll have more segment files for changelogs, which will 
> increase file handles.
> The trick is doing this in a generic way, since we are supporting changelogs 
> for more than just Kafka systems. I think the interface to do the stream 
> creation belongs in the SystemAdmin interface. It would be nice to have a 
> generic SystemAdmin.createStream() interface, but this would require giving 
> it kafka-specific configuration. Another option is to have 
> SystemAdmin.createChangelogStream, but this seems a bit hacky at first 
> glance. We need to think this part through.
> [~martinkl], in hello-samza, how are we creating log compacted state stores 
> with the appropriate number of partitions? Is this handled as part of 
> bin/grid?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to