[ https://issues.apache.org/jira/browse/SAMZA-226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chris Riccomini updated SAMZA-226: ---------------------------------- Fix Version/s: (was: 0.8.0) 0.9.0 > Auto-create changelog streams for kv > ------------------------------------ > > Key: SAMZA-226 > URL: https://issues.apache.org/jira/browse/SAMZA-226 > Project: Samza > Issue Type: Bug > Components: container, kv > Affects Versions: 0.8.0 > Reporter: Chris Riccomini > Assignee: Naveen > Fix For: 0.9.0 > > Attachments: rb28016 (1).patch, rb28016 (2).patch, rb28016.patch > > > Currently, changelog topics are not auto-created. This is a frustrating user > experience, and there are a few useful defaults that should be set that are > not obvious when creating Kafka topics with log compaction enabled. > We should have Samza auto-create changelog streams for the kv stores that > have changelogs enabled. > In Kafka's case, the changelog topics should be created with compaction > enabled. They should also be created with a smaller (100mb) default > [segment.bytes|http://kafka.apache.org/documentation.html#configuration] > setting. The smaller segment.bytes setting is useful for low-volume > changelogs. The problem we've seen in the past is that the default > log.segment.bytes is 1 gig. Kafka's compaction implementation NEVER touches > the most recent log segment. This means that, if you have a very small state > store, but execute a lot of deletes/updates (e.g. you've only got maybe 25 > megs of active state, but are deleting and updating it frequently), you will > always end up with at LEAST 1 gig of state to restore (since the most recent > segment will always contain non-compacted writes). This is silly since your > active (compacted) state is really only ~25 megs. Shrinking the segment bytes > means that you'll have a smaller maximum data size to restore. The trade off > here is that we'll have more segment files for changelogs, which will > increase file handles. > The trick is doing this in a generic way, since we are supporting changelogs > for more than just Kafka systems. I think the interface to do the stream > creation belongs in the SystemAdmin interface. It would be nice to have a > generic SystemAdmin.createStream() interface, but this would require giving > it kafka-specific configuration. Another option is to have > SystemAdmin.createChangelogStream, but this seems a bit hacky at first > glance. We need to think this part through. > [~martinkl], in hello-samza, how are we creating log compacted state stores > with the appropriate number of partitions? Is this handled as part of > bin/grid? -- This message was sent by Atlassian JIRA (v6.3.4#6332)