We'd like to avoid giving PTransforms access to the pipeline options during
pipeline construction. There are a few compelling reasons for doing so. The
biggest one is that the context in which the pipeline is constructed and
the context in which it executes may not be the same.

As an example, if I am executing a pipeline on some remote execution
engine, I may want to to operate with more limited (or broader) permissions
than I have available locally. Those permissions should be determined at
the time the pipeline is executed, as attempting to make decisions based on
the construction time environment may prevent the pipeline from performing
correctly.

Credentials are an especially thorny issue, as some pipelines might want to
have steps that execute with different credentials (to make sure the
permissions for reading and writing are minimally scoped, for example).

Transforms that require objects or configuration that is passed via
pipeline options should generally take the argument they require explicitly
rather than by some invisible channel. We had this capability prior to
version 2.0, and removed the ability to access those options in favor of
explicit configuration.

Access to pipeline options at execution time is usually reasonable, though
if there's a way of passing more explicit configuration it would likely be
preferred. Most of these objects under the discussion should be
instantiated during execution with the appropriate pipeline options as an
argument, or have it available as a context parameter when invoked.

On Tue, Jul 11, 2017 at 3:03 PM, Dmitry Demeshchuk <[email protected]>
wrote:

> Hi folks,
>
> Sometimes, it would be very useful if PTransforms had access to global
> pipeline options, such as various credentials, settings and so on.
>
> Per conversation in https://issues.apache.org/jira/browse/BEAM-2572, I'd
> like to kick off a discussion about that.
>
> This would be beneficial for at least one major use case: support for
> different cloud providers (AWS, Azure, etc) and an ability to specify each
> provider's credentials just once in the pipeline options.
>
> It looks like the trickiest part is not to make the PTransform objects have
> access to pipeline options (we could possibly just modified the
> Pipeline.apply
> <https://github.com/apache/beam/blob/master/sdks/python/
> apache_beam/pipeline.py#L355>
> method), but to actually pass these options down the road, such as to DoFn
> objects and FileSystem objects.
>
> I'm still in the process of reading the code and understanding of what this
> could look like, so any input would be really appreciated.
>
> Thank you.
>
> --
> Best regards,
> Dmitry Demeshchuk.
>

Reply via email to