[ 
https://issues.apache.org/jira/browse/SAMZA-256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14048376#comment-14048376
 ] 

Chris Riccomini commented on SAMZA-256:
---------------------------------------

bq. Keeping that pattern would allow us to break out RocksDB, LevelDB or other 
implementations that bring substantial baggage, into their own modules, thus 
making it easier to keep that baggage sorted and segregated

Yea, that was my main knock on the second approach. It seems to muddle 
dependencies between the different implementations.

bq. Would it be better to have separate a LevelDBKeyValueStorageEngineFactory, 
RocksDBKeyValueStorageEngineFactory, HashMapKeyValueStorageEngineFactory, etc?

The main thing that bugs me about the mutliple-factory approach is that they're 
all going to cargo cult almost exactly the same chunk of code. If you look at 
KeyValueStorageEngineFactory, about 99% of it is setting up the cache, 
serialization, changelog, etc. This is stuff that every key value store is 
going to want to do identically. One way around that would be to have the 
KeyValueStorageEngineFactory be a base class for all the rest, and just have 
some abstract method that returns the underlying KV store (i.e. LevelDB, 
RocksDB, etc).

On the flip side of this, I could imagine some KV stores wanting to slightly 
tweak the getStorageEngine method. For example, there's no point in having a 
cache or serialized key value store when using an in-memory TreeMap 
implementation. Having different factories lets us control this a bit better, 
maybe.

bq. stores.*.factory.persistent=true / false

This config approach only works if we draw a line in the sand that RocksDB will 
be the only storage engine supported by the KeyValueStorageEngine 
implementation. In such a case, true=use the TreeMap implementation, and 
false=use RocksDB. However, if we ever wanted to support another 
implementation, you then end up needing a second parameter. Something like:

stores.*.factory.persistent.store=rocksdb|leveldb|lmdb|etc

I'm not sure if this is better or worse than approach (1), though.

> Provide in-memory data store implementation
> -------------------------------------------
>
>                 Key: SAMZA-256
>                 URL: https://issues.apache.org/jira/browse/SAMZA-256
>             Project: Samza
>          Issue Type: Improvement
>          Components: kv
>    Affects Versions: 0.6.0
>            Reporter: Jakob Homan
>            Assignee: Chinmay Soman
>             Fix For: 0.8.0
>
>
> The sole current kv store, LevelDbKeyValueStore, works well when the amount 
> of data to be stored is prohibitively large to keep it all in memory.  
> However, in cases where the state is small enough to comfortably fit in 
> whatever memory is available, it would be better to provide an in-memory 
> implementation.  This can be backed by either a native Java class, or perhaps 
> a Guava class, if that is found to scale better (or, of course, the backing 
> implementation could be configurable).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to