Answers inline.
On 5/29/20 5:46 PM, Luke Cwik wrote:
To go back to your original question.
I would remove the static convenience methods in DoFnSignatures since
they construct a DoFnSignature and then throw it away. This
construction is pretty involved, nothing as large as an IO call but it
would become noticeable if it was abused. We can already see that it
is being used multiple times in a row [1, 2].
There should be no performance implications of this, as there is cache
involved [1].
Runners should create their own derived properties based upon
knowledge of how they are implemented and we shouldn't create derived
properties for different concepts (e.g. merging isStateful and
@RequiresTimeSortedInput). If there is a common implementation that is
shared across multiple runners, it could "translate" a DoFnSignature
based upon how it is implemented and/or define its own thing.
The problem here is that in order to use a common implementation a
runner must know that it should use it (in this specific case to use
StatefulDoFnRunner instead of plain SimpleDoFnRunner). This might
slightly relate to discussion about pipeline requirements vs. runner
capabilities, although from a different perspective.
[1]
https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/DoFnSignatures.java#L262
Jan
On Wed, May 27, 2020 at 3:16 AM Jan Lukavský <je...@seznam.cz
<mailto:je...@seznam.cz>> wrote:
Right, this might be about a definition of what these methods
really should return. Currently, the most visible issue is [1].
When a DoFn has no state or timer, but is annotated with
@RequiresTimeSortedInput this annotation is silently ignored,
because DoFnSignature#usesState returns false and the ParDo is
executed as stateless.
I agree that there are two points - what user declares and what
runner effectively needs to execute a DoFn. Another complication
to this is that what runner needs might depend not only on the
DoFn itself, but on other conditions - e.g.
RequiresTimeSortedInput does not require any state or timer in
bounded case, when runner can presort the data. There might be
additional inputs to this decision as well.
I don't quite agree that DoFnSignature#isStateful is a bad name -
when a DoFn has only timer and no state, it is still stateful,
although usesState should return false. Or we would have to
declare timer a state, which would be even more confusing
(although it might be technically correct).
[1] https://issues.apache.org/jira/browse/BEAM-10072
On 5/27/20 1:21 AM, Luke Cwik wrote:
I believe DoFnSignature#isStateful is remnant of a bad API name
choice and was renamed to usesState. I would remove
DoFnSignature#isStateful as it does not seem to be used anywhere.
Does DoFnSignatures#usesValueState return true if the DoFn says
it needs @RequiresTimeSortedInput because of how a DoFn is being
"wrapped" with a stateful DoFn that provides the time sorting
functionality?
That doesn't seem right since I would have always expected that
DoFnSignature(s) should be about the DoFn passed in and not about
the implementation details that a runner might be using in how it
provides @RequiresTimeSortedInput.
(similarly for
DoFnSignatures#usesBagState, DoFnSignatures#usesWatermarkHold,
DoFnSignatures#usesTimers, DoFnSignatures#usesState)
On Mon, May 25, 2020 at 2:31 AM Jan Lukavský <je...@seznam.cz
<mailto:je...@seznam.cz>> wrote:
Hi,
I have come across issue with multiple way of getting a
meaningful flags
for DoFns. We have
a) DoFnSignature#{usesState,usesTimers,isStateful,...}, and
b) DoFnSignatures#{usesState,usesTimers,isStateful,...}
These two might not (and actually are not) aligned with each
other. That
can be solved quite easily (removing any logic from
DoFnSignatures and
put it to DoFnSignature), but what I'm not sure is why
DoFnSignature#isStateful is deprecated in favor of
DoFnSignature#usesState. In my understanding, it should hold
that
`isStateful() iff usesState() || usesTimers()`, which means
these two
should not be used interchangeably. I'd suggest to
undeprecate the
`DoFnSignature#isStateful` and align the various (static and
non-static)
versions of these calls.
Thoughts?
Jan