Right, this might be about a definition of what these methods really
should return. Currently, the most visible issue is [1]. When a DoFn has
no state or timer, but is annotated with @RequiresTimeSortedInput this
annotation is silently ignored, because DoFnSignature#usesState returns
false and the ParDo is executed as stateless.
I agree that there are two points - what user declares and what runner
effectively needs to execute a DoFn. Another complication to this is
that what runner needs might depend not only on the DoFn itself, but on
other conditions - e.g. RequiresTimeSortedInput does not require any
state or timer in bounded case, when runner can presort the data. There
might be additional inputs to this decision as well.
I don't quite agree that DoFnSignature#isStateful is a bad name - when a
DoFn has only timer and no state, it is still stateful, although
usesState should return false. Or we would have to declare timer a
state, which would be even more confusing (although it might be
technically correct).
[1] https://issues.apache.org/jira/browse/BEAM-10072
On 5/27/20 1:21 AM, Luke Cwik wrote:
I believe DoFnSignature#isStateful is remnant of a bad API name choice
and was renamed to usesState. I would remove DoFnSignature#isStateful
as it does not seem to be used anywhere.
Does DoFnSignatures#usesValueState return true if the DoFn says it
needs @RequiresTimeSortedInput because of how a DoFn is being
"wrapped" with a stateful DoFn that provides the time sorting
functionality?
That doesn't seem right since I would have always expected that
DoFnSignature(s) should be about the DoFn passed in and not about the
implementation details that a runner might be using in how it
provides @RequiresTimeSortedInput.
(similarly for
DoFnSignatures#usesBagState, DoFnSignatures#usesWatermarkHold, DoFnSignatures#usesTimers, DoFnSignatures#usesState)
On Mon, May 25, 2020 at 2:31 AM Jan Lukavský <je...@seznam.cz
<mailto:je...@seznam.cz>> wrote:
Hi,
I have come across issue with multiple way of getting a meaningful
flags
for DoFns. We have
a) DoFnSignature#{usesState,usesTimers,isStateful,...}, and
b) DoFnSignatures#{usesState,usesTimers,isStateful,...}
These two might not (and actually are not) aligned with each
other. That
can be solved quite easily (removing any logic from DoFnSignatures
and
put it to DoFnSignature), but what I'm not sure is why
DoFnSignature#isStateful is deprecated in favor of
DoFnSignature#usesState. In my understanding, it should hold that
`isStateful() iff usesState() || usesTimers()`, which means these two
should not be used interchangeably. I'd suggest to undeprecate the
`DoFnSignature#isStateful` and align the various (static and
non-static)
versions of these calls.
Thoughts?
Jan