Github user aljoscha commented on the pull request:
https://github.com/apache/flink/pull/1831#issuecomment-200933436
To elaborate on this. State right now works well if you stick to the
(admittedly somewhat hidden) rules. That is, you should only access state if
there is a key available.
If there is no key available the behavior changes in unexpected ways based
on what state backend is used and the capabilities of the key serializer. For
example, let's look at access to `ValueState` in `open()`. For mem/fs state:
`ValueState.value()` works, it will return the default value.
`ValueState.update()` will throw a NPE. For RocksDB state: Neither method works
if the key serializer cannot handle null values. If it can, then both methods
will change state for the `null` key.
For these reasons I would like to change the semantics of state such that
the user always has to call `getState` (or a similar method) and that the
returned accessor object is documented to only be valid for the duration of the
processing method. Right now, the user can wreak all kinds of havoc by
down-casting the returned State object. Right now we have a very simple system
that works if the user keeps to the rules and also makes things go fast. If we
want to make it more restrictive we will lose some performance, of course.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---