[
https://issues.apache.org/jira/browse/KAFKA-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14933603#comment-14933603
]
Jay Kreps commented on KAFKA-2592:
----------------------------------
I think the impl strategy that Samza has is pretty good, it keeps an in-memory
object cache which flushes to Kafka and RocksDB when it gets full or when
commit is called. This buffer is nice because a very common situation is doing
lots of increments with a small number of very common keys. So this
accomplishes the following:
1. All writes can be batch writes which are about 3x faster in RocksDB
2. Duplicate updates to the same key get deduped (only the last update is
written to Kafka/RocksDB)
3. Since it is an object cache there is no serialization cost for access
Basically these make the common case of
store.put(key, store.get(key) + 1)
really fast.
> Stop Writing the Change-log in store.put() / delete() for Non-transactional
> Store
> ---------------------------------------------------------------------------------
>
> Key: KAFKA-2592
> URL: https://issues.apache.org/jira/browse/KAFKA-2592
> Project: Kafka
> Issue Type: Sub-task
> Reporter: Guozhang Wang
> Assignee: Yasuhiro Matsuda
> Fix For: 0.9.0.0
>
>
> Today we keep a dirty threshold and try to send to change-log in store.put()
> / delete() when the threshold has been exceeded. Doing this will largely
> increase the likelihood of inconsistent state upon unclean shutdown.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)