The third option for batching:

- Functionality within the DoFn and DoFnRunner built as part of the SDK.

I haven't thought through Batching, but at least for the
IntraBundleParallelization use case this actually does make sense to expose
as a part of the model. Knowing that a DoFn supports parallelization, a
runner may want to control how much parallelization is allowed, and the
DoFn also needs to make sure to wait on all those threads (and make sure
they're properly setup for logging/metrics/etc. associated with the current
step).

There may be good reasons to make this a property of a DoFn that the runner
can inspect, and support. For instance, if a DoFn wants to process batches
of 50, it may be possible to factor that into how input is split/bundled.

On Thu, Jan 26, 2017 at 3:49 PM Kenneth Knowles <k...@google.com.invalid>
wrote:

> On Thu, Jan 26, 2017 at 3:42 PM, Eugene Kirpichov <
> kirpic...@google.com.invalid> wrote:
>
> > The class for invoking DoFn's,
> > DoFnInvokers, is absent from the SDK (and present in runners-core) for a
> > good reason.
> >
>
> This would be true if it weren't for that pesky DoFnTester :-)
>
> And even if we solve that problem, in the future it will be in the SDK's Fn
> Harness.
>

Reply via email to