Which environment would be used to perform the expansion? I think this is an 
interesting option, as long as it does not introduce a hard dependency on 
docker.

The same environment that the to-be-expanded transform requires during runtime.

Dataflow has been doing something similar in this route where it is trying to 
get rid of the driver program running on the users machine. If you can get the 
expansion service to launch and run an environment to perform the expansion, 
you could also get it to create and submit a job as well returning data around 
the running job.

Portability already runs without a driver on the user machine, apart from expansion and staging. For anything runtime-related the job server kicks in. It's worth to think about delegating expansion and staging to the Job server.

On 24.05.19 23:48, Lukasz Cwik wrote:
Dataflow has been doing something similar in this route where it is trying to get rid of the driver program running on the users machine. If you can get the expansion service to launch and run an environment to perform the expansion, you could also get it to create and submit a job as well returning data around the running job.

On Thu, May 23, 2019 at 7:47 AM Thomas Weise <[email protected] <mailto:[email protected]>> wrote:



    On Thu, May 23, 2019 at 3:46 AM Maximilian Michels <[email protected]
    <mailto:[email protected]>> wrote:

         >  Writing a new transform involves updating the expansion
        service to include their new transform.

        Would it be conceivable that the expansion is performed via the
        environment? That would solve the problem of updating the expansion
        service, although it adds additional complexity for bringing up the
        environment.


    Which environment would be used to perform the expansion? I think
    this is an interesting option, as long as it does not introduce a
    hard dependency on docker.

        On 23.05.19 11:31, Robert Bradshaw wrote:
         > On Wed, May 22, 2019 at 6:17 PM Maximilian Michels
        <[email protected] <mailto:[email protected]>
         > <mailto:[email protected] <mailto:[email protected]>>> wrote:
         >
         >     Hi,
         >
         >     Robert and me were discussing on the subject of
        user-specified
         >     environments for external transforms [1]. We couldn't
        decide whether
         >     users should have direct control over the environment
        when they use an
         >     external transform in their pipeline.
         >
         >     In my mind, it is quite natural that the Expansion
        Service is a
         >     long-running service that gets started with a list of
        available
         >     environments.
         >
         >
         > +1.
         >
         > IMHO, the expansion service should be expected to provide valid
         > environments for the transforms it vendors. Removing this
        expectation
         > seems wrong. Making it cheap to specify non-default
        dependencies without
         > building (publishing, etc.) a docker image is probably key to
        making
         > this work well (and also allowing more powerful environment
        introspection).
         >
         >     Such a list can be outdated and users may write transforms
         >     for a new environment they want to use in their pipeline.
         >
         >
         > This is the part that I'm having trouble following. Writing a
        new
         > transform involves updating the expansion service to include
        their new
         > transform. The author of a transform (in other words, the one
        who
         > defines its expansion and implementation) is in the position
        to name its
         > dependencies, etc. and the user of the transform (the one
        invoking it)
         > is not in a generally good position to know what environments
        would be
         > valid.
         >
         >     The easiest
         >     way would be to allow to pass the environment with the
        transform.
         >
         >
         > What this allows is using existing transforms in new
        environments. There
         > are possibly some usecases for this, e.g. expansion of a
        given transform
         > may be compatible with ether version X or version Y of a
        library, left
         > up to the discretion of the caller, but I think that this is
        really just
         > a deficiency in our environment specifications (e.g. it one
        should be
         > able to express this flexibility in the returned environment).
         >
         >     Note
         >     that we already give users control over the "main"
        environment via the
         >     PortablePipelineOptions, so this wouldn't be an entirely
        new concept.
         >
         >
         > Yes, the author of a pipeline/transform chooses the
        environment in which
         > those transforms execute.
         >
         >     The contrary position is that the Expansion Service
        should have full
         >     control over which environment is chosen. Going back to
        the discussion
         >     about artifact staging [2], this could enable to perform more
         >     optimizations, such as merging environments or detecting
        conflicts.
         >     However, this only works if this information has been
        provided upfront
         >     to the Expansion Service. It wouldn't be impossible to
        provide these
         >     hints alongside with the environment like suggested in
        the previous
         >     paragraph.
         >
         >     Any opinions? Should we allow users to optionally specify an
         >     environment
         >     for external transforms?
         >
         >     Thanks,
         >     Max
         >
         >     [1] https://github.com/apache/beam/pull/8639
         >     [2]
         >
        
https://lists.apache.org/thread.html/6fcee7047f53cf1c0636fb65367ef70842016d57effe2e5795c4137d@%3Cdev.beam.apache.org%3E
         >

Reply via email to