[ https://issues.apache.org/jira/browse/KAFKA-1275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13923015#comment-13923015 ]
Jay Kreps commented on KAFKA-1275: ---------------------------------- Hey Joe/Joel, I clarified the guarantees and fixed the typos. Thanks. Log compaction will take more space than just the raw compacted data in the time between compactions. This is bounded by min.cleanable.dirty.ratio (i.e. a dirty ratio of 50% implies 2x space usage). The part about the compaction strategy is interesting. The compaction is always by key--older entries for the same primary key are discarded. This implies a usage strategy where the new value contains the full information. You can think of this like physical logging in databases where the full after-image of the row might be logged after an update. You are describing the possibility of making this pluggable so that fancier things could be done (aggregation or what have you). I am skeptical of the operational feasability of this in an "as a service" deployment, though: basically I don't want a bunch of custom user code running on a centrally managed cluster and I don't want to be deploying new aggregation jars each time this logic changes. So I am a little scared of that. :-) > fixes for quickstart documentation > ---------------------------------- > > Key: KAFKA-1275 > URL: https://issues.apache.org/jira/browse/KAFKA-1275 > Project: Kafka > Issue Type: Bug > Components: website > Affects Versions: 0.8.1 > Reporter: Evan Zacks > Assignee: Jay Kreps > Priority: Minor > Labels: documentation > Fix For: 0.8.1 > > Attachments: KAFKA-1275-quickstart-doc.patch > > > The quickstart guide refers to commands that no longer exist in the master > git branch per changes in KAFKA-554. > If changes for the documentation to match 0.8.1 are already in development > elsewhere, please feel free to discard this issue. -- This message was sent by Atlassian JIRA (v6.2#6252)