Templates, including ValueProviders, were recently added to the Python SDK. +1 to pursuing this train of thought (and as I mentioned on the bug, and has been mentioned here, we don't want to add PipelineOptions access to PTransforms/at construction time).
On Tue, Jul 11, 2017 at 3:21 PM, Kenneth Knowles <[email protected]> wrote: > Hi Dmitry, > > This is a very worthwhile discussion that has recently come up on > StackOverflow, here: https://stackoverflow.com/a/45024542/4820657 > > We actually recently _removed_ the PipelineOptions from Pipeline.apply in > Java since they tend to cause transforms to have implicit changes that make > them non-portable. Baking in credentials would probably fall into this > category. > > The other aspect to this is that we want to be able to build a pipeline and > run it later, in an environment chosen when we decide to run it. So > PipelineOptions are really for running, not building, a Pipeline. You can > still use them for arg parsing and passing specific values to transforms - > that is essentially orthogonal and just accidentally conflated. > > I can't speak to the state of Python SDK's maturity in this regard, but > there is a concept of a "ValueProvider" that is a deferred value that can > be specified by PipelineOptions when you run your pipeline. This may be > what you want. You build a PTransform passing some of its configuration > parameters as ValueProvider and at run time you set them to actual values > that are passed to the UDFs in your pipeline. > > Hope this helps. Despite not being deeply involved in Python, I wanted to > lay out the territory so someone else could comment further without having > to go into background. > > Kenn > > On Tue, Jul 11, 2017 at 3:03 PM, Dmitry Demeshchuk <[email protected]> > wrote: > >> Hi folks, >> >> Sometimes, it would be very useful if PTransforms had access to global >> pipeline options, such as various credentials, settings and so on. >> >> Per conversation in https://issues.apache.org/jira/browse/BEAM-2572, I'd >> like to kick off a discussion about that. >> >> This would be beneficial for at least one major use case: support for >> different cloud providers (AWS, Azure, etc) and an ability to specify each >> provider's credentials just once in the pipeline options. >> >> It looks like the trickiest part is not to make the PTransform objects have >> access to pipeline options (we could possibly just modified the >> Pipeline.apply >> <https://github.com/apache/beam/blob/master/sdks/python/ >> apache_beam/pipeline.py#L355> >> method), but to actually pass these options down the road, such as to DoFn >> objects and FileSystem objects. >> >> I'm still in the process of reading the code and understanding of what this >> could look like, so any input would be really appreciated. >> >> Thank you. >> >> -- >> Best regards, >> Dmitry Demeshchuk. >>
