+1 to the above responses to for passing option into PTransforms.

As Robert mentioned in the JIRA issue, filesystem plug-ins are in a
different category. It is reasonable for them to create credentials based
on options/environment variables. We could have a protocol for
instantiating file system plugins and there pass them the pipeline options
in its entirety.

There is also the security aspect of all of this. None of the proposed
options here will ensure that credential passed in this way will be secure.
It is possible that the ptransform/file system plugin to log (or do more
unsecure things) once getting the credentials.

Ahmet

On Tue, Jul 11, 2017 at 4:00 PM, Robert Bradshaw <
[email protected]> wrote:

> Templates, including ValueProviders, were recently added to the Python
> SDK. +1 to pursuing this train of thought (and as I mentioned on the
> bug, and has been mentioned here, we don't want to add PipelineOptions
> access to PTransforms/at construction time).
>
> On Tue, Jul 11, 2017 at 3:21 PM, Kenneth Knowles <[email protected]>
> wrote:
> > Hi Dmitry,
> >
> > This is a very worthwhile discussion that has recently come up on
> > StackOverflow, here: https://stackoverflow.com/a/45024542/4820657
> >
> > We actually recently _removed_ the PipelineOptions from Pipeline.apply in
> > Java since they tend to cause transforms to have implicit changes that
> make
> > them non-portable. Baking in credentials would probably fall into this
> > category.
> >
> > The other aspect to this is that we want to be able to build a pipeline
> and
> > run it later, in an environment chosen when we decide to run it. So
> > PipelineOptions are really for running, not building, a Pipeline. You can
> > still use them for arg parsing and passing specific values to transforms
> -
> > that is essentially orthogonal and just accidentally conflated.
> >
> > I can't speak to the state of Python SDK's maturity in this regard, but
> > there is a concept of a "ValueProvider" that is a deferred value that can
> > be specified by PipelineOptions when you run your pipeline. This may be
> > what you want. You build a PTransform passing some of its configuration
> > parameters as ValueProvider and at run time you set them to actual values
> > that are passed to the UDFs in your pipeline.
> >
> > Hope this helps. Despite not being deeply involved in Python, I wanted to
> > lay out the territory so someone else could comment further without
> having
> > to go into background.
> >
> > Kenn
> >
> > On Tue, Jul 11, 2017 at 3:03 PM, Dmitry Demeshchuk <[email protected]
> >
> > wrote:
> >
> >> Hi folks,
> >>
> >> Sometimes, it would be very useful if PTransforms had access to global
> >> pipeline options, such as various credentials, settings and so on.
> >>
> >> Per conversation in https://issues.apache.org/jira/browse/BEAM-2572,
> I'd
> >> like to kick off a discussion about that.
> >>
> >> This would be beneficial for at least one major use case: support for
> >> different cloud providers (AWS, Azure, etc) and an ability to specify
> each
> >> provider's credentials just once in the pipeline options.
> >>
> >> It looks like the trickiest part is not to make the PTransform objects
> have
> >> access to pipeline options (we could possibly just modified the
> >> Pipeline.apply
> >> <https://github.com/apache/beam/blob/master/sdks/python/
> >> apache_beam/pipeline.py#L355>
> >> method), but to actually pass these options down the road, such as to
> DoFn
> >> objects and FileSystem objects.
> >>
> >> I'm still in the process of reading the code and understanding of what
> this
> >> could look like, so any input would be really appreciated.
> >>
> >> Thank you.
> >>
> >> --
> >> Best regards,
> >> Dmitry Demeshchuk.
> >>
>

Reply via email to