Hi Dmitry,

This is a very worthwhile discussion that has recently come up on
StackOverflow, here: https://stackoverflow.com/a/45024542/4820657

We actually recently _removed_ the PipelineOptions from Pipeline.apply in
Java since they tend to cause transforms to have implicit changes that make
them non-portable. Baking in credentials would probably fall into this
category.

The other aspect to this is that we want to be able to build a pipeline and
run it later, in an environment chosen when we decide to run it. So
PipelineOptions are really for running, not building, a Pipeline. You can
still use them for arg parsing and passing specific values to transforms -
that is essentially orthogonal and just accidentally conflated.

I can't speak to the state of Python SDK's maturity in this regard, but
there is a concept of a "ValueProvider" that is a deferred value that can
be specified by PipelineOptions when you run your pipeline. This may be
what you want. You build a PTransform passing some of its configuration
parameters as ValueProvider and at run time you set them to actual values
that are passed to the UDFs in your pipeline.

Hope this helps. Despite not being deeply involved in Python, I wanted to
lay out the territory so someone else could comment further without having
to go into background.

Kenn

On Tue, Jul 11, 2017 at 3:03 PM, Dmitry Demeshchuk <[email protected]>
wrote:

> Hi folks,
>
> Sometimes, it would be very useful if PTransforms had access to global
> pipeline options, such as various credentials, settings and so on.
>
> Per conversation in https://issues.apache.org/jira/browse/BEAM-2572, I'd
> like to kick off a discussion about that.
>
> This would be beneficial for at least one major use case: support for
> different cloud providers (AWS, Azure, etc) and an ability to specify each
> provider's credentials just once in the pipeline options.
>
> It looks like the trickiest part is not to make the PTransform objects have
> access to pipeline options (we could possibly just modified the
> Pipeline.apply
> <https://github.com/apache/beam/blob/master/sdks/python/
> apache_beam/pipeline.py#L355>
> method), but to actually pass these options down the road, such as to DoFn
> objects and FileSystem objects.
>
> I'm still in the process of reading the code and understanding of what this
> could look like, so any input would be really appreciated.
>
> Thank you.
>
> --
> Best regards,
> Dmitry Demeshchuk.
>

Reply via email to