Re: Proposal: Dynamic PIpelineOptions

Robert Bradshaw Tue, 02 Aug 2016 11:11:36 -0700

Being able to "late-bind" parameters like input paths to a
pre-constructed program would be a very useful feature, and I think is
worth adding to Beam.


Of the four API proposals, I have a strong preference for (4).
Further, it seems that these need not be bound to the PipelineOptions
object itself (i.e. a named RuntimeValueSupplier could be constructed
off of a pipeline object), which the Python API makes less heavy use
of (encouraging the user to use familiar, standard libraries for
argument parsing), though of course such integration is useful to
provide for convenience.

- Robert

On Fri, Jul 29, 2016 at 12:14 PM, Sam McVeety <[email protected]> wrote:
> During the graph construction phase, the given SDK generates an initial
> execution graph for the program.  At execution time, this graph is
> executed, either locally or by a service.  Currently, Beam only supports
> parameterization at graph construction time.  Both Flink and Spark supply
> functionality that allows a pre-compiled job to be run without SDK
> interaction with updated runtime parameters.
>
> In its current incarnation, Dataflow can read values of PipelineOptions at
> job submission time, but this requires the presence of an SDK to properly
> encode these values into the job.  We would like to build a common layer
> into the Beam model so that these dynamic options can be properly provided
> to jobs.
>
> Please see
> https://docs.google.com/document/d/1I-iIgWDYasb7ZmXbGBHdok_IK1r1YAJ90JG5Fz0_28o/edit
> for the high-level model, and
> https://docs.google.com/document/d/17I7HeNQmiIfOJi0aI70tgGMMkOSgGi8ZUH-MOnFatZ8/edit
> for
> the specific API proposal.
>
> Cheers,
> Sam

Re: Proposal: Dynamic PIpelineOptions

Reply via email to