[
https://issues.apache.org/jira/browse/BEAM-2950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16164021#comment-16164021
]
Kenneth Knowles commented on BEAM-2950:
---------------------------------------
-1
We considered this and opted not to do it. Here are the criteria considered for
the user-facing State API:
https://s.apache.org/beam-state#heading=h.j7e7f226dsrr and another criteria we
left off the document was readability of end-user code.
So I think this proposal has the following flaws:
# To start with, it isn't very useful. PCollectionView.get() makes it easier to
wrap a DoFn with side inputs to produce a composite with side inputs, because
side inputs are passed in from outside the composite, and PCollectionViews are
globally rooted values. StateSpec is not passed in, but declared within the
DoFn, rooted in the primitive ParDo, so there's no extensibility gained. There
is no "double wiring" problem like we have with side inputs. It doesn't make
any sense for a composite to have state.
# Direct access at first appears "more intuitive" because to a newcomer it
"looks like" normal field access. But in fact it is nothing like normal field
access so this intuition is misleading and should not be encouraged. So it is
actually less readable because your intuitive reading is wrong.
# This design would miss the validation aspect. One way it is different than
normal mutatey programming is that there are many places it is illegal to
reference state, such as StartBundle/FinishBundle, or passing to another
object. This proposal would turn those into dynamic failures at best, or in the
worst case data corruption (runner fails to catch illegal access, and permits
some thread-global context to leak)
# It is actually mandatory that we are always able to detect state, as it is
essentially a different primitive (VanillaParDo, SplittableParDo, and
StatefulParDo are executed totally differently, even in the mathematical sense)
# As for timers, we need to associate the ID with a method. An alternative is
serialized callbacks so if you include timers you need to include a full design
for that.
# A runner can't automatically prefetch, because it doesn't know which state is
used by which methods.
# Magic by mutating stuff into place is just less readable / more error prone.
There's a very strong burden of proof / design doc / dev list consensus to move
in this direction.
> Provide implicit access to State
> --------------------------------
>
> Key: BEAM-2950
> URL: https://issues.apache.org/jira/browse/BEAM-2950
> Project: Beam
> Issue Type: Bug
> Components: sdk-java-core
> Reporter: Eugene Kirpichov
> Assignee: Kenneth Knowles
>
> https://github.com/apache/beam/pull/3814 provides implicit access to side
> inputs (without a ProcessContext). Luke suggests to have the same for State
> and, I suppose, timers. We could also have it for PipelineOptions: in any
> given user code invocation, these are all unambiguous.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)