Yes, as Kostas said, it would initially nor provide more functionality but it would enable us to add it later.
On a side not, why would you call it KvState? And what would be called KvState? > On 14 Dec 2015, at 11:14, Kostas Tzoumas <ktzou...@apache.org> wrote: > > I suppose that they can start as sugar and evolve to a different > implementation. > > I would +1 the name change to KVState, OperatorState is indeed somewhat > confusing, and it will only get harder to rename later. > > On Mon, Dec 14, 2015 at 11:09 AM, Gyula Fóra <gyula.f...@gmail.com> wrote: > >> Would the Reducing/Folding states just be some API sugar on top of what we >> have know (ValueState) or does it have some added functionality (like >> incremental checkpoints for list states)? >> >> Gyula >> >> Aljoscha Krettek <aljos...@apache.org> ezt írta (időpont: 2015. dec. 14., >> H, 11:03): >> >>> While enhancing the state interfaces we would also need to introduce new >>> types of state. I was thinking of these, for a start: >>> - ValueState (works like OperatorState works now, i.e. provides methods >>> to get/set one state value >>> - ListState, proves methods to add one element to a list of elements and >>> to iterate over all contained elements >>> - ReducingState, somewhat similar to value state but combines the added >>> value to the existing value using a ReduceFunction >>> - FoldingState, same as above but with fold >>> >>> I think these are necessary to give the system more knowledge about the >>> semantics of state so that it can handle the state more efficiently. >> Think >>> of incremental checkpoints, for example, these are easy to do if you know >>> that state is a list to which stuff is only appended. >>>> On 14 Dec 2015, at 10:52, Stephan Ewen <se...@apache.org> wrote: >>>> >>>> A lot of this makes sense, but I am not sure about renaming >>>> "OperatorState". The other name is nicer, but why make users' life hard >>>> just for a name? >>>> >>>> >>>> On Mon, Dec 14, 2015 at 10:46 AM, Maximilian Michels <m...@apache.org> >>> wrote: >>>> >>>>> Hi Aljoscha, >>>>> >>>>> Thanks for the informative technical description. >>>>> >>>>>> - function state: this is the state that you get when a user function >>>>> implements the Checkpointed interface. it is not partitioned >>>>>> - operator state: This is the state that a StreamOperator can >> snapshot, >>>>> it is similar to the function state, but for operators. it is not >>>>> partitioned >>>>>> - partitioned state: state that is scoped to the key of the incoming >>>>> element, in Flink, this is (confusingly) called OperatorState and >>> KvState >>>>> (internally) >>>>> >>>>> Let's clean that up! Let's rename the OperatorState interface to >>> KvState. >>>>> >>>>>> Both stream operators and user functions can have partitioned state, >>> and >>>>> the namespace is the same, i.e. the state can clash. The partitioned >>> state >>>>> will stay indefinitely if not manually cleared. >>>>> >>>>> I suppose operators currently have to take care to use a unique >>>>> identifier for the state such that it doesn't clash with the user >>>>> function. Wouldn't be too hard to introduce a scoping here. >>>>> >>>>> Your proposal makes sense. It seems like this is a rather delicate >>>>> change which improves the flexibility of the streaming API. What is >>>>> the motivation behind this? I suppose you are thinking of improvements >>>>> to the session capabilities of the streaming API. >>>>> >>>>>> If we want to also implement the current WindowOperator on top of >> these >>>>> generic facilities we need to have a way to scope state not only by >> key >>> but >>>>> also by windows (or better, some generic state scope). >>>>> >>>>> This is currently handled by the WindowOperator itself and would then >>>>> be delegated to the enhanced state interface? Makes sense if we want >>>>> to make use of the new state interface. Again, is it just cleaner or >>>>> does this enable new type of applications? >>>>> >>>>> Cheers, >>>>> Max >>>>> >>>>> On Thu, Dec 10, 2015 at 4:47 PM, Aljoscha Krettek < >> aljos...@apache.org> >>>>> wrote: >>>>>> Hi All, >>>>>> I want to discuss some ideas about improving the >> primitives/operations >>>>> that Flink offers for user-state, timers and windows and how these >>> concepts >>>>> can be unified. >>>>>> >>>>>> It has come up a lot lately that people have very specific >> requirements >>>>> regarding the state that they keep and it seems necessary to allows >>> users >>>>> to set their own custom timers (on processing time and watermark time >>>>> (event-time)) to do both expiration of state and implementation of >>> custom >>>>> windowing semantics. While we’re at this, we might also think about >>>>> cleaning up the state handling a bit. >>>>>> >>>>>> Let me first describe the status quo, so that we’re all on the same >>>>> page. There are three types of state: >>>>>> - function state: this is the state that you get when a user function >>>>> implements the Checkpointed interface. it is not partitioned >>>>>> - operator state: This is the state that a StreamOperator can >> snapshot, >>>>> it is similar to the function state, but for operators. it is not >>>>> partitioned >>>>>> - partitioned state: state that is scoped to the key of the incoming >>>>> element, in Flink, this is (confusingly) called OperatorState and >>> KvState >>>>> (internally) >>>>>> >>>>>> (Operator is the low-level concept, user functions are usually >> invoked >>>>> by the operator, for example StreamMap is the operator that handles a >>>>> MapFunction.) >>>>>> >>>>>> Function state and operator state is not partitioned, meaning that it >>>>> becomes difficult when we want to implement dynamic >> scale-in/scale-out. >>>>> With partitioned state it can be redistributed when changing the >> degree >>> of >>>>> parallelism. >>>>>> >>>>>> Both stream operators and user functions can have partitioned state, >>> and >>>>> the namespace is the same, i.e. the state can clash. The partitioned >>> state >>>>> will stay indefinitely if not manually cleared. >>>>>> >>>>>> On to timers, operators can register processing-time callbacks, they >>> can >>>>> react to watermarks to implement event-time callbacks. They have to >>>>> implement the logic themselves, however. For example, the >> WindowOperator >>>>> has custom code to keep track of watermark timers and for reacting to >>>>> watermarks. User functions have no way of registering timers. Also, >>> timers >>>>> are not scoped to any key. So if you register a timer while processing >>> an >>>>> element of a certain key, when the timer fires you don’t know what key >>> was >>>>> active when registering the timer. This might be necessary for >> cleaning >>> up >>>>> state for certain keys, or to trigger processing for a certain key >> only, >>>>> for example with session windows of some kind. >>>>>> >>>>>> Now, on to new stuff. I propose to add a timer facility that can be >>> used >>>>> by both operators and user functions. Both partitioned state and >> timers >>>>> should be aware of keys and if a timer fires the partitioned state >>> should >>>>> be scoped to the same key that was active when the timer was >> registered. >>>>>> >>>>>> One last bit. If we want to also implement the current WindowOperator >>> on >>>>> top of these generic facilities we need to have a way to scope state >> not >>>>> only by key but also by windows (or better, some generic state scope). >>> The >>>>> reason is, that one key can have several active windows at one point >> in >>>>> time and firing timers need to me mapped to the correct window (for >>>>> example, for sliding windows, or session windows or what have you…). >>>>>> >>>>>> Happy discussing. :D >>>>>> >>>>>> Cheers, >>>>>> Aljoscha >>>>>> >>>>>> >>>>> >>> >>> >>