@Gyula

Can you explain a bit what this KeyValue store would do more then the
partitioned key/value state?

On Tue, Sep 8, 2015 at 2:49 PM, Gábor Gévay <gga...@gmail.com> wrote:

> Hello,
>
> As for use cases, in my old job at Ericsson we were building a
> streaming system that was processing data from telephone networks, and
> it was using key-value stores a LOT. For example, keeping track of
> various state info of the users (which cell are they currently
> connected to, what bearers do they have, ...); mapping from IDs of
> users in one subsystem of the telephone network to the IDs of the same
> users in an other subsystem; mapping from IDs of phone calls to lists
> of IDs of participating users; etc.
> So I imagine they would like this a lot. (At least, if they were
> considering moving to Flink :))
>
> Best,
> Gabor
>
>
>
>
> 2015-09-08 13:35 GMT+02:00 Gyula Fóra <gyf...@apache.org>:
> > Hey All,
> >
> > The last couple of days I have been playing around with the idea of
> > building a streaming key-value store abstraction using stateful streaming
> > operators that can be used within Flink Streaming programs seamlessly.
> >
> > Operations executed on this KV store would be fault tolerant as it
> > integrates with the checkpointing mechanism, and if we add timestamps to
> > each put/get/... operation we can use the watermarks to create fully
> > deterministic results. This functionality is very useful for many
> > applications, and is very hard to implement properly with some dedicates
> kv
> > store.
> >
> > The KVStore abstraction could look as follows:
> >
> > KVStore<K,V> store = new KVStore<>;
> >
> > Operations:
> >
> > store.put(DataStream<Tuple2<K,V>>)
> > store.get(DataStream<K>) -> DataStream<KV<K,V>>
> > store.remove(DataStream<K>) -> DataStream<KV<K,V>>
> > store.multiGet(DataStream<K[]>) -> DataStream<KV<K,V>[]>
> > store.getWithKeySelector(DataStream<X>, KeySelector<X,K>) ->
> > DataStream<KV<X,V>[]>
> >
> > For the resulting streams I used a special KV abstraction which let's us
> > return null values.
> >
> > The implementation uses a simple streaming operator for executing most of
> > the operations (for multi get there is an additional merge operator) with
> > either local or partitioned states for storing the kev-value pairs (my
> > current prototype uses local states). And it can either execute
> operations
> > eagerly (which would not provide deterministic results), or buffer up
> > operations and execute them in order upon watermarks.
> >
> > As for use cases you can probably come up with many I will save that for
> > now :D
> >
> > I have a prototype implementation here that can execute the operations
> > described above (does not handle watermarks and time yet):
> >
> > https://github.com/gyfora/flink/tree/KVStore
> > And also an example job:
> >
> >
> https://github.com/gyfora/flink/blob/KVStore/flink-staging/flink-streaming/flink-streaming-core/src/test/java/org/apache/flink/streaming/api/KVStreamExample.java
> >
> > What do you think?
> > If you like it I will work on writing tests and it still needs a lot of
> > tweaking and refactoring. This might be something we want to include with
> > the standard streaming libraries at one point.
> >
> > Cheers,
> > Gyula
>

Reply via email to