On Wed, May 22, 2019 at 6:17 PM Maximilian Michels <[email protected]> wrote:

> Hi,
>
> Robert and me were discussing on the subject of user-specified
> environments for external transforms [1]. We couldn't decide whether
> users should have direct control over the environment when they use an
> external transform in their pipeline.
>
> In my mind, it is quite natural that the Expansion Service is a
> long-running service that gets started with a list of available
> environments.


+1.

IMHO, the expansion service should be expected to provide valid
environments for the transforms it vendors. Removing this expectation seems
wrong. Making it cheap to specify non-default dependencies without building
(publishing, etc.) a docker image is probably key to making this work well
(and also allowing more powerful environment introspection).


> Such a list can be outdated and users may write transforms
> for a new environment they want to use in their pipeline.


This is the part that I'm having trouble following. Writing a new transform
involves updating the expansion service to include their new transform. The
author of a transform (in other words, the one who defines its expansion
and implementation) is in the position to name its dependencies, etc. and
the user of the transform (the one invoking it) is not in a generally good
position to know what environments would be valid.


> The easiest
> way would be to allow to pass the environment with the transform.


What this allows is using existing transforms in new environments. There
are possibly some usecases for this, e.g. expansion of a given transform
may be compatible with ether version X or version Y of a library, left up
to the discretion of the caller, but I think that this is really just a
deficiency in our environment specifications (e.g. it one should be able to
express this flexibility in the returned environment).


> Note
> that we already give users control over the "main" environment via the
> PortablePipelineOptions, so this wouldn't be an entirely new concept.
>

Yes, the author of a pipeline/transform chooses the environment in which
those transforms execute.


> The contrary position is that the Expansion Service should have full
> control over which environment is chosen. Going back to the discussion
> about artifact staging [2], this could enable to perform more
> optimizations, such as merging environments or detecting conflicts.
> However, this only works if this information has been provided upfront
> to the Expansion Service. It wouldn't be impossible to provide these
> hints alongside with the environment like suggested in the previous
> paragraph.
>
> Any opinions? Should we allow users to optionally specify an environment
> for external transforms?
>
> Thanks,
> Max
>
> [1] https://github.com/apache/beam/pull/8639
> [2]
>
> https://lists.apache.org/thread.html/6fcee7047f53cf1c0636fb65367ef70842016d57effe2e5795c4137d@%3Cdev.beam.apache.org%3E
>

Reply via email to