Right, this might be about a definition of what these methods really should return. Currently, the most visible issue is [1]. When a DoFn has no state or timer, but is annotated with @RequiresTimeSortedInput this annotation is silently ignored, because DoFnSignature#usesState returns false and the ParDo is executed as stateless.

I agree that there are two points - what user declares and what runner effectively needs to execute a DoFn. Another complication to this is that what runner needs might depend not only on the DoFn itself, but on other conditions - e.g. RequiresTimeSortedInput does not require any state or timer in bounded case, when runner can presort the data. There might be additional inputs to this decision as well.

I don't quite agree that DoFnSignature#isStateful is a bad name - when a DoFn has only timer and no state, it is still stateful, although usesState should return false. Or we would have to declare timer a state, which would be even more confusing (although it might be technically correct).

[1] https://issues.apache.org/jira/browse/BEAM-10072

On 5/27/20 1:21 AM, Luke Cwik wrote:
I believe DoFnSignature#isStateful is remnant of a bad API name choice and was renamed to usesState. I would remove DoFnSignature#isStateful as it does not seem to be used anywhere.

Does DoFnSignatures#usesValueState return true if the DoFn says it needs @RequiresTimeSortedInput because of how a DoFn is being "wrapped" with a stateful DoFn that provides the time sorting functionality?

That doesn't seem right since I would have always expected that DoFnSignature(s) should be about the DoFn passed in and not about the implementation details that a runner might be using in how it provides @RequiresTimeSortedInput.

(similarly for DoFnSignatures#usesBagState, DoFnSignatures#usesWatermarkHold, DoFnSignatures#usesTimers, DoFnSignatures#usesState)




On Mon, May 25, 2020 at 2:31 AM Jan Lukavský <je...@seznam.cz <mailto:je...@seznam.cz>> wrote:

    Hi,

    I have come across issue with multiple way of getting a meaningful
    flags
    for DoFns. We have

      a) DoFnSignature#{usesState,usesTimers,isStateful,...}, and

      b) DoFnSignatures#{usesState,usesTimers,isStateful,...}

    These two might not (and actually are not) aligned with each
    other. That
    can be solved quite easily (removing any logic from DoFnSignatures
    and
    put it to DoFnSignature), but what I'm not sure is why
    DoFnSignature#isStateful is deprecated in favor of
    DoFnSignature#usesState. In my understanding, it should hold that
    `isStateful() iff usesState() || usesTimers()`, which means these two
    should not be used interchangeably. I'd suggest to undeprecate the
    `DoFnSignature#isStateful` and align the various (static and
    non-static)
    versions of these calls.

    Thoughts?

      Jan

Reply via email to