Re: Proposal: Dynamic PIpelineOptions

Amit Sela Sun, 07 Aug 2016 10:19:17 -0700

+1 sounds like a good idea.

Spark's driver actually takes all dynamic parameters starting with "spark."
and propagates them into SparkConf which is propagated onto the Executors
and is available via the environment's SparkEnv.


I'm wondering, does this mean that PipelineOption will be available to the
PTransform, or only the ValueSupplier (yes, (4) for me too please) ?

Thanks,
Amit

On Fri, Aug 5, 2016 at 5:41 PM Aljoscha Krettek <[email protected]> wrote:

> +1
>
> It's true that Flink provides a way to pass dynamic parameters to operator
> instances. That's not used in any of the built-in sources and operators,
> however. They are instantiated with their parameters when the graph is
> constructed. So what you are suggesting for Beam would actually provide
> more functionality than what we currently have in Flink. :-)
>
> Out of the options I think (4) would be the best. (1) and (2) are not type
> safe, correct? and (3) seems very boilerplate-y.
>
> Cheers,
> Aljoscha
>
> On Thu, 4 Aug 2016 at 21:53 Frances Perry <[email protected]> wrote:
>
> > +Amit, Aljoscha, Manu
> >
> > Any comments from folks on the Flink, Spark, or Gearpump runners?
> >
> > On Tue, Aug 2, 2016 at 11:10 AM, Robert Bradshaw <
> > [email protected]> wrote:
> >
> > > Being able to "late-bind" parameters like input paths to a
> > > pre-constructed program would be a very useful feature, and I think is
> > > worth adding to Beam.
> > >
> > > Of the four API proposals, I have a strong preference for (4).
> > > Further, it seems that these need not be bound to the PipelineOptions
> > > object itself (i.e. a named RuntimeValueSupplier could be constructed
> > > off of a pipeline object), which the Python API makes less heavy use
> > > of (encouraging the user to use familiar, standard libraries for
> > > argument parsing), though of course such integration is useful to
> > > provide for convenience.
> > >
> > > - Robert
> > >
> > > On Fri, Jul 29, 2016 at 12:14 PM, Sam McVeety <[email protected]
> >
> > > wrote:
> > > > During the graph construction phase, the given SDK generates an
> initial
> > > > execution graph for the program.  At execution time, this graph is
> > > > executed, either locally or by a service.  Currently, Beam only
> > supports
> > > > parameterization at graph construction time.  Both Flink and Spark
> > supply
> > > > functionality that allows a pre-compiled job to be run without SDK
> > > > interaction with updated runtime parameters.
> > > >
> > > > In its current incarnation, Dataflow can read values of
> PipelineOptions
> > > at
> > > > job submission time, but this requires the presence of an SDK to
> > properly
> > > > encode these values into the job.  We would like to build a common
> > layer
> > > > into the Beam model so that these dynamic options can be properly
> > > provided
> > > > to jobs.
> > > >
> > > > Please see
> > > > https://docs.google.com/document/d/1I-iIgWDYasb7ZmXbGBHdok_I
> > > K1r1YAJ90JG5Fz0_28o/edit
> > > > for the high-level model, and
> > > > https://docs.google.com/document/d/17I7HeNQmiIfOJi0aI70tgGMM
> > > kOSgGi8ZUH-MOnFatZ8/edit
> > > > for
> > > > the specific API proposal.
> > > >
> > > > Cheers,
> > > > Sam
> > >
> >
>

Re: Proposal: Dynamic PIpelineOptions

Reply via email to