Re: Environments for External Transforms

Chamikara Jayalath Wed, 22 May 2019 11:09:35 -0700

On Wed, May 22, 2019 at 9:17 AM Maximilian Michels <[email protected]> wrote:


> Hi,
>
> Robert and me were discussing on the subject of user-specified
> environments for external transforms [1]. We couldn't decide whether
> users should have direct control over the environment when they use an
> external transform in their pipeline.
>
> In my mind, it is quite natural that the Expansion Service is a
> long-running service that gets started with a list of available
> environments. Such a list can be outdated and users may write transforms
> for a new environment they want to use in their pipeline. The easiest
> way would be to allow to pass the environment with the transform. Note
> that we already give users control over the "main" environment via the
> PortablePipelineOptions, so this wouldn't be an entirely new concept.
>


I think we are trying to generalize the expansion service along multiple
axes.
(1) dependencies
(a) dependencies embedded in an environment (b) dependencies specific to an
transform (c) dependencies specified by the user expanding the transform

(2) environments
(a)default environment (b) environments specified a startup of the
expansion service (c) environments specified by the user expanding the
transform (this proposal)

It's great if we can implement the most generic solution along all these
exes but I think we run into risk of resulting in broken combinations by
trying to implement this before we have other necessary pieces to support a
long running expansion service. For example, support for dynamically
registering transforms and support for discovering transforms.

What is the need for implementing 2 (c) now ? If there's no real need now I
suggest we settle with 2(a) or 2(b) for now till we can truly support a
long running expansion service. Also we'll have a better idea of how this
kind if features should evolve when we have at least two runners supporting
cross-language transforms (we are in the process of updating Dataflow to
support this). Just my 2 cents though :)


>
> The contrary position is that the Expansion Service should have full
> control over which environment is chosen. Going back to the discussion
> about artifact staging [2], this could enable to perform more
> optimizations, such as merging environments or detecting conflicts.
> However, this only works if this information has been provided upfront
> to the Expansion Service. It wouldn't be impossible to provide these
> hints alongside with the environment like suggested in the previous
> paragraph.
>
> Any opinions? Should we allow users to optionally specify an environment
> for external transforms?
>
> Thanks,
> Max
>
> [1] https://github.com/apache/beam/pull/8639
> [2]
>
> https://lists.apache.org/thread.html/6fcee7047f53cf1c0636fb65367ef70842016d57effe2e5795c4137d@%3Cdev.beam.apache.org%3E
>

Re: Environments for External Transforms

Reply via email to