On Fri, Oct 27, 2023 at 7:50 AM Kellen Dye via dev <dev@beam.apache.org> wrote:
>
> > Auto is hard, because it would involve
> > querying the runner before pipeline construction, and we may not even
> > know what the runner is at this point
>
> At the point where pipeline construction will start, you should have access 
> to the pipeline arguments and be able to determine the runner. What seems to 
> be missing is a place to query the runner pre-construction. If that query 
> could return metadata about the currently running version of the job, then 
> that could be incorporated into graph construction as necessary.

While this is the common case, it is not true in general. For example
it's possible to cache the pipeline proto and submit it to a separate
choice of runner later. We have Jobs API implementations that
forward/proxy the job to other runners, and the Python interactive
runner is another example where the runner is late-binding (e.g. one
tries a sample locally, and if all looks good can execute remotely,
and also in this case the graph that's submitted is often mutated
before running).

Also, in the spirit of the portability story, the pipeline definition
itself should be runner-independent.

> That same hook could be a place to for example return the currently-running 
> job graph for pre-submission compatibility checks.

I suppose we could add something to the Jobs API to make "looking up a
previous version of this pipeline" runner-agnostic, though that
assumes it's available at construction time. And +1 as Kellen says we
should define (and be able to check) what pipeline compatibility means
in a via graph-to-graph comparison at the Beam level. I'll defer both
of these as future work as part of the "make update a portable Beam
concept" project.

Reply via email to