Some extensions to the DoFn API

Reuven Lax Mon, 04 Jun 2018 07:07:06 -0700

Beam was created with an annotation-based processing API, that allows the
framework to automatically inject parameters to a DoFn's process method
(and also allows the user to mark any method as the process method
using @ProcessElement). However, these annotations were never completed. A
specific set of parameters could be injected (e.g. the window or
PipelineOptions), but for anything else you had to access it through the
ProcessContext. This limited the readability advantage of this API.


A couple of months ago I spent a bit of time extending the set of
annotations allowed. In particular, the most common uses of ProcessContext
were accessing the input element and outputting elements, and both of those
can now be done without ProcessContext. Example usage:

new DoFn<InputT, OutputT>() {
  @ProcessElement process(@Element InputT element, OutputReceiver<OutputT>
out) {
    out.output(convertInputToOutput(element));
  }
}

No need for ProcessContext anywhere in this DoFn! The Beam framework also
does type checking - if the @Element type was not InputT, you would have
seen an error. Multi-output DoFns also work, using a MultiOutputReceiver
interface.

I'll update the Beam docs later with this information, but most information
accessible from ProcessContext, OnTimerContext, StartBundleContext, or
FinishBundleContext can now be accessed via this sort of injection. The
main exceptions are side inputs and output from finishbundle, both of which
still require the context objects; however I hope to find time to provide
direct access to those as well.

pr/5331 (in progress) converts most of Beam's built-in transforms to use
this clearer style.

Reuven

Some extensions to the DoFn API

Reply via email to