[ https://issues.apache.org/jira/browse/KAFKA-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14933603#comment-14933603 ]
Jay Kreps commented on KAFKA-2592: ---------------------------------- I think the impl strategy that Samza has is pretty good, it keeps an in-memory object cache which flushes to Kafka and RocksDB when it gets full or when commit is called. This buffer is nice because a very common situation is doing lots of increments with a small number of very common keys. So this accomplishes the following: 1. All writes can be batch writes which are about 3x faster in RocksDB 2. Duplicate updates to the same key get deduped (only the last update is written to Kafka/RocksDB) 3. Since it is an object cache there is no serialization cost for access Basically these make the common case of store.put(key, store.get(key) + 1) really fast. > Stop Writing the Change-log in store.put() / delete() for Non-transactional > Store > --------------------------------------------------------------------------------- > > Key: KAFKA-2592 > URL: https://issues.apache.org/jira/browse/KAFKA-2592 > Project: Kafka > Issue Type: Sub-task > Reporter: Guozhang Wang > Assignee: Yasuhiro Matsuda > Fix For: 0.9.0.0 > > > Today we keep a dirty threshold and try to send to change-log in store.put() > / delete() when the threshold has been exceeded. Doing this will largely > increase the likelihood of inconsistent state upon unclean shutdown. -- This message was sent by Atlassian JIRA (v6.3.4#6332)