[ 
https://issues.apache.org/jira/browse/SAMZA-424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14153788#comment-14153788
 ] 

Chris Riccomini commented on SAMZA-424:
---------------------------------------

bq. Notice that we don’t have delete() functionality as part of this API since 
the expiration/eviction will be done as defined in the config.

Is there anything prohibiting us from adding delete? Seems like there might be 
cases where a developer might want to forcibly delete a key (for example, to 
trigger a refresh to guarantee they get the newest day).

bq. In addition, 'all()' and  'range() have been left out since they can wreak 
havoc with the expiration/eviction.

I think this depends on the underlying cache implementation. It seems like we 
might be able to implement a cache where all/range don't affect the key's TTL. 
All(), in particular, seems like it might be useful for a lot of things (for 
example, to see what you have in your cache without an a priori knowledge or 
message).

bq. The KeyValueStore<K,V> can be refactored to extend from this interface and 
add the missing operations.

It seems odd to have the KV store implement Cache. Seems more like the Cache 
should be called KeyValueStore, and the existing KV store should be named 
RangeKeyValueStore, or something. This is a backwards incompatible change for 
folks that were using all/range, but I feel like that might be OK.

bq. timeToLiveSeconds

TTL seems specific to the eviction policy. LFU might want something else.

bq. When the container is re-started, the StorageEngine 'restore' method is 
used to fill up the cache.

Something slightly confusing about this is how the TTL plays with this. Should 
TTL be reset when the cache is restored? I can see wanting to use a durable 
cache, but keep the TTLs from before the failure. This way, you don't 
accidentally restore a cache from several days ago (for example), but don't 
evict the keys until the TTLs have re-expired. This would protect against short 
downtimes, where the cache is still safe to use, but allow full refreshes on 
longer downtimes.

I'm not quite sure how TTL preservation would work, though. It seems you'd have 
to preserve the data in the Kafka message, but then that gets into 
implementation specific streams, which we've shied away from.

bq. In addition, the restore() method then has to figure out a way to populate 
the cache (since we cannot use the putAll method).

This could be handled the same way that the LevelDB implementation works. The 
"rawDb" is what's passed to the restore method. It's a bit of a hack, but it 
works.

One thing to consider is how this implementation would integrate with the 
global state store implementation that we end up building in SAMZA-402. This 
was @sriram's main point of feedback. Ideally, we don't want to end up with two 
different, but almost identical, things. The main spot where I see the risk of 
confusion is in config. For example, you propose an "isShared" config, which is 
something that the global state implementation could use as well.

Another example: does it make sense to use the Cache store (e.g. SamzaEhCache) 
as the store for a global state store? It seems like it might. This would 
effectively give you a rolling buffer (with some eviction policy--windowed 
msSinceWrite-based, or size based?) for a read-only global store. Seems kind of 
mind bending. This could, for example, be a very easy way to implement a 
map-side join. You could define a global state store that uses a cache store 
with a window-based (say, 5 minute) eviction policy, and join against it.

Note: After seeing Word+PDF design doc in action, I definitely prefer a 
text-based source with formatted PDF print. Word makes copy/paste very 
difficult (at least on my Mac).

Nit: The config proposal is using camel case. We've been using dotted lower 
case for config. I'm assuming this is just a doc detail, and not an 
implementation proposal.

> Add a Cache state API to the Samza container
> --------------------------------------------
>
>                 Key: SAMZA-424
>                 URL: https://issues.apache.org/jira/browse/SAMZA-424
>             Project: Samza
>          Issue Type: New Feature
>          Components: container
>            Reporter: Chinmay Soman
>            Assignee: Chinmay Soman
>         Attachments: SAMZA-424-Cache-API_0.pdf
>
>
> There are cases when the user code needs access to a 'cache' which can be 
> used to store custom data. This cache is different from the KeyValue store in 
> the following ways:
> * At the very least Needs to support LRU (Least Recently Used) and TTL (Time 
> To Live) eviction strategies
> * May not support all() and range() operations (since this wreaks havoc with 
> the eviction operation)
> * Needs to exist at a per task or a per container level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to