On Fri, May 5, 2017 at 1:33 PM, Kenneth Knowles <k...@google.com.invalid> wrote:
> On Fri, May 5, 2017 at 12:43 PM, Robert Bradshaw <
> rober...@google.com.invalid> wrote:
>
>> On Fri, May 5, 2017 at 12:14 PM, Kenneth Knowles <k...@google.com.invalid>
>> wrote:
>> > In Java, we _can_ now enforce this statically, thanks to Thomas G's
>> > improvements. We could either force it to be globally windowed or - since
>> > it can't inspect the element or timestamp - just run a couple test calls
>> > through "assignWindows", optionally confirming equality of the results.
>>
>> Could you clarify more? I'd like to enforce this statically, but I
>> don't see how to detect a particular DoFn emits a globally windowed
>> value. But that make things much better.
>
> Currently, FinishBundleContext has just the outputWindowedValue method.

Things are changing fast around here :).

> To
> maintain that, safely, we would add GloballyWindowedOutput or some such
> with the output() and outputTimestamp() method. This is routine work quite
> analogous to what we already have. We would validate that the output
> windowing strategy is global windows if the parameter is present.

So the current recommendation is to write

        c.output(
            [value],
            BoundedWindow.TIMESTAMP_MIN_VALUE,
            GlobalWindow.INSTANCE);

but we could recommend the user use a different context to not have to
specify BoundedWindow.TIMESTAMP_MIN_VALUE and GlobalWindow.INSTANCE so
that we could check this at compile time.

>> We
>> > can also enforce that the correct subclass of window is output, though
>> this
>> > is not implemented as such right now.
>>
>> Are you thinking a type check based on the reflectively obtained
>> parameter of the WindowFn?
>>
>
> Yes. Though we don't do it today, it would be OutputWithWindow<W> or some
> such (which is actually the same as FinishBundleContext<W>) where W was
> some window class that would be checked against the window coder. We cannot
> use the same inner class trick to avoid referencing the type, but that
> shouldn't be burdensome in this case.

Is adding a type parameter backwards compatible? I suppose with raw
types it is... Not sure how this would work with a proper BatchignDoFn
that may output to any window type, but maybe.

Python's a bit different as it doesn't use contexts to emit values.

>> But this behavior is all to support people who want to write transforms
>> > that work with essentially only one windowing strategy, basically
>> > pre-windowing stuff. Are there really any meaningful alternatives to the
>> > global window? If so, I don't think that is realistically what this is
>> > about. Such transforms should be permitted, but not encouraged or
>> supported
>> > automatically, as they are antithetical to the unified model. So I would
>> > also suggest that they should validate that their input is globally
>> > windowed and explicitly output to the global window. Trivial and they
>> > already should be doing that. They also have the capability of storing
>> the
>> > input's WindowFn and producing behavior identical to what you describe.
>>
>> What it boils down to is that this is a common case that only works
>> easily in the Global window. Another example that I can think of like
>> this in the SDK, namely global combine. If you're in the global
>> window, you get a singleton PCollection, and we don't require users to
>> do extra for this behavior. If someone tries to use this transform in
>> a non-globally-windowed setting, only then do we give an error (this
>> time at pipeline construction) and force the user to do extra work.
>> It's a balance between having something that works correctly in the
>> unified model but requires extra (potentially tricky and error-prone)
>> effort only iff one needs it.

The question above is the crux of the issue--are we striking the right
balance in requiring the user to specify a Window in all cases when we
can infer it in the by far most common case (like with global
combine's default value) and can detect and warn the user when they
don't fall into that case.

Reply via email to