Hi Dmitry, This is a very worthwhile discussion that has recently come up on StackOverflow, here: https://stackoverflow.com/a/45024542/4820657
We actually recently _removed_ the PipelineOptions from Pipeline.apply in Java since they tend to cause transforms to have implicit changes that make them non-portable. Baking in credentials would probably fall into this category. The other aspect to this is that we want to be able to build a pipeline and run it later, in an environment chosen when we decide to run it. So PipelineOptions are really for running, not building, a Pipeline. You can still use them for arg parsing and passing specific values to transforms - that is essentially orthogonal and just accidentally conflated. I can't speak to the state of Python SDK's maturity in this regard, but there is a concept of a "ValueProvider" that is a deferred value that can be specified by PipelineOptions when you run your pipeline. This may be what you want. You build a PTransform passing some of its configuration parameters as ValueProvider and at run time you set them to actual values that are passed to the UDFs in your pipeline. Hope this helps. Despite not being deeply involved in Python, I wanted to lay out the territory so someone else could comment further without having to go into background. Kenn On Tue, Jul 11, 2017 at 3:03 PM, Dmitry Demeshchuk <[email protected]> wrote: > Hi folks, > > Sometimes, it would be very useful if PTransforms had access to global > pipeline options, such as various credentials, settings and so on. > > Per conversation in https://issues.apache.org/jira/browse/BEAM-2572, I'd > like to kick off a discussion about that. > > This would be beneficial for at least one major use case: support for > different cloud providers (AWS, Azure, etc) and an ability to specify each > provider's credentials just once in the pipeline options. > > It looks like the trickiest part is not to make the PTransform objects have > access to pipeline options (we could possibly just modified the > Pipeline.apply > <https://github.com/apache/beam/blob/master/sdks/python/ > apache_beam/pipeline.py#L355> > method), but to actually pass these options down the road, such as to DoFn > objects and FileSystem objects. > > I'm still in the process of reading the code and understanding of what this > could look like, so any input would be really appreciated. > > Thank you. > > -- > Best regards, > Dmitry Demeshchuk. >
