[ 
https://issues.apache.org/jira/browse/SAMZA-428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160488#comment-14160488
 ] 

Roger Hoover commented on SAMZA-428:
------------------------------------

Hi Jay and all,

Do you think that state like this could be made fault tolerant and still 
performant by batching?  That way the serialization and Kafka write overhead is 
amortized over many records.

I'm wondering if Samza should have better support for batching.  There are 
times when jobs read to do remote calls and could probably get much better 
throughput with batch reads (multi-get) and writes.  Probably batching can be 
done using the one-at-a-time API but I'm just wondering aloud if batch support 
can/should be easier.  Certainly, the MessageChooser API should allow you to 
select a mixed batch rather than everything from a single source topic per 
batch.

> Investigate: how to tune down caching in the KeyValueStore implementations
> --------------------------------------------------------------------------
>
>                 Key: SAMZA-428
>                 URL: https://issues.apache.org/jira/browse/SAMZA-428
>             Project: Samza
>          Issue Type: Improvement
>          Components: kv
>    Affects Versions: 0.8.0
>            Reporter: Chinmay Soman
>             Fix For: 0.8.0
>
>
> Currently, we have a 'CachedStore' layer on top of the KeyValueStore 
> implementation that we use. This might lead to double caching:
> i) Once at the CachedStore layer
> ii) Possibly cached again in the specific K-V store that we use (for eg: 
> RocksDB / BDB)
> We need the CachedStore layer so that the writes to LoggedStore (if 
> configured) are done in an efficient manner. 
> We can then potentially do some config tuning for the K-V store to reduce its 
> memory footprint and simply write to disk. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to