On Fri, Oct 27, 2023 at 7:50 AM Kellen Dye via dev <dev@beam.apache.org> wrote: > > > Auto is hard, because it would involve > > querying the runner before pipeline construction, and we may not even > > know what the runner is at this point > > At the point where pipeline construction will start, you should have access > to the pipeline arguments and be able to determine the runner. What seems to > be missing is a place to query the runner pre-construction. If that query > could return metadata about the currently running version of the job, then > that could be incorporated into graph construction as necessary.
While this is the common case, it is not true in general. For example it's possible to cache the pipeline proto and submit it to a separate choice of runner later. We have Jobs API implementations that forward/proxy the job to other runners, and the Python interactive runner is another example where the runner is late-binding (e.g. one tries a sample locally, and if all looks good can execute remotely, and also in this case the graph that's submitted is often mutated before running). Also, in the spirit of the portability story, the pipeline definition itself should be runner-independent. > That same hook could be a place to for example return the currently-running > job graph for pre-submission compatibility checks. I suppose we could add something to the Jobs API to make "looking up a previous version of this pipeline" runner-agnostic, though that assumes it's available at construction time. And +1 as Kellen says we should define (and be able to check) what pipeline compatibility means in a via graph-to-graph comparison at the Beam level. I'll defer both of these as future work as part of the "make update a portable Beam concept" project.