Agree to delete them, though for different reasons. I think this code comes
from a desire to have methods that can be called on a DoFn directly. And
from reviewing the code history I think they are copied in from another
class. So that's why they are the way they are. Accepting a DoFnSignature
would be more appropriate to the "plural-class-name companion class"
pattern. But I doubt the perf impact of this is ever measurable, and of
course not relative to a big data processing job. If we really wanted the
current API, a cache is trivial, but also not important so we shouldn't add
one.

Reason I think they should be deleted:
1. They seem to exist as a shortcut to people don't forget to call both
DoFnSignatures#usesState and DoFnSignatures#usesTimers [1]. But now if
another relevant method is added, the new method doesn't include it, so the
problem of not forgetting to call all relevant methods is not solved.
2. They seem incorrect [2]. Just because something requires time sorted
input *does not* mean it uses bag state.

Kenn

[1]
https://github.com/apache/beam/blob/dba5f2b9d8625a3be3dae026858ecacf20947616/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/DoFnSignatures.java#L2432
[2]
https://github.com/apache/beam/blob/dba5f2b9d8625a3be3dae026858ecacf20947616/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/DoFnSignatures.java#L2449

On Fri, May 29, 2020 at 8:46 AM Luke Cwik <lc...@google.com> wrote:

> To go back to your original question.
>
> I would remove the static convenience methods in DoFnSignatures since they
> construct a DoFnSignature and then throw it away. This construction is
> pretty involved, nothing as large as an IO call but it would become
> noticeable if it was abused. We can already see that it is being used
> multiple times in a row [1, 2].
>
> Runners should create their own derived properties based upon knowledge of
> how they are implemented and we shouldn't create derived properties for
> different concepts (e.g. merging isStateful and @RequiresTimeSortedInput).
> If there is a common implementation that is shared across multiple runners,
> it could "translate" a DoFnSignature based upon how it is implemented
> and/or define its own thing.
>
> 1:
> https://github.com/apache/beam/blob/0addd1f08a2e3f424199c1054c06f363bb77a019/runners/apex/src/main/java/org/apache/beam/runners/apex/translation/ParDoTranslator.java#L61
> 2:
> https://github.com/apache/beam/blob/0addd1f08a2e3f424199c1054c06f363bb77a019/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ParDoTranslatorBatch.java#L73
>
> On Wed, May 27, 2020 at 3:16 AM Jan Lukavský <je...@seznam.cz> wrote:
>
>> Right, this might be about a definition of what these methods really
>> should return. Currently, the most visible issue is [1]. When a DoFn has no
>> state or timer, but is annotated with @RequiresTimeSortedInput this
>> annotation is silently ignored, because DoFnSignature#usesState returns
>> false and the ParDo is executed as stateless.
>>
>> I agree that there are two points - what user declares and what runner
>> effectively needs to execute a DoFn. Another complication to this is that
>> what runner needs might depend not only on the DoFn itself, but on other
>> conditions - e.g. RequiresTimeSortedInput does not require any state or
>> timer in bounded case, when runner can presort the data. There might be
>> additional inputs to this decision as well.
>>
>> I don't quite agree that DoFnSignature#isStateful is a bad name - when a
>> DoFn has only timer and no state, it is still stateful, although usesState
>> should return false. Or we would have to declare timer a state, which would
>> be even more confusing (although it might be technically correct).
>>
>> [1] https://issues.apache.org/jira/browse/BEAM-10072
>> On 5/27/20 1:21 AM, Luke Cwik wrote:
>>
>> I believe DoFnSignature#isStateful is remnant of a bad API name choice
>> and was renamed to usesState. I would remove DoFnSignature#isStateful as it
>> does not seem to be used anywhere.
>>
>> Does DoFnSignatures#usesValueState return true if the DoFn says it needs
>> @RequiresTimeSortedInput because of how a DoFn is being "wrapped" with a
>> stateful DoFn that provides the time sorting functionality?
>>
>> That doesn't seem right since I would have always expected that
>> DoFnSignature(s) should be about the DoFn passed in and not about the
>> implementation details that a runner might be using in how it
>> provides @RequiresTimeSortedInput.
>>
>> (similarly for
>> DoFnSignatures#usesBagState, DoFnSignatures#usesWatermarkHold, 
>> DoFnSignatures#usesTimers, DoFnSignatures#usesState)
>>
>>
>>
>>
>> On Mon, May 25, 2020 at 2:31 AM Jan Lukavský <je...@seznam.cz> wrote:
>>
>>> Hi,
>>>
>>> I have come across issue with multiple way of getting a meaningful flags
>>> for DoFns. We have
>>>
>>>   a) DoFnSignature#{usesState,usesTimers,isStateful,...}, and
>>>
>>>   b) DoFnSignatures#{usesState,usesTimers,isStateful,...}
>>>
>>> These two might not (and actually are not) aligned with each other. That
>>> can be solved quite easily (removing any logic from DoFnSignatures and
>>> put it to DoFnSignature), but what I'm not sure is why
>>> DoFnSignature#isStateful is deprecated in favor of
>>> DoFnSignature#usesState. In my understanding, it should hold that
>>> `isStateful() iff usesState() || usesTimers()`, which means these two
>>> should not be used interchangeably. I'd suggest to undeprecate the
>>> `DoFnSignature#isStateful` and align the various (static and non-static)
>>> versions of these calls.
>>>
>>> Thoughts?
>>>
>>>   Jan
>>>
>>>

Reply via email to