[ 
https://issues.apache.org/jira/browse/KAFKA-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14933603#comment-14933603
 ] 

Jay Kreps commented on KAFKA-2592:
----------------------------------

I think the impl strategy that Samza has is pretty good, it keeps an in-memory 
object cache which flushes to Kafka and RocksDB when it gets full or when 
commit is called. This buffer is nice because a very common situation is doing 
lots of increments with a small number of very common keys. So this 
accomplishes the following:
1. All writes can be batch writes which are about 3x faster in RocksDB
2. Duplicate updates to the same key get deduped (only the last update is 
written to Kafka/RocksDB)
3. Since it is an object cache there is no serialization cost for access

Basically these make the common case of 
  store.put(key, store.get(key) + 1)
really fast.

> Stop Writing the Change-log in store.put() / delete() for Non-transactional 
> Store
> ---------------------------------------------------------------------------------
>
>                 Key: KAFKA-2592
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2592
>             Project: Kafka
>          Issue Type: Sub-task
>            Reporter: Guozhang Wang
>            Assignee: Yasuhiro Matsuda
>             Fix For: 0.9.0.0
>
>
> Today we keep a dirty threshold and try to send to change-log in store.put() 
> / delete() when the threshold has been exceeded. Doing this will largely 
> increase the likelihood of inconsistent state upon unclean shutdown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to