[
https://issues.apache.org/jira/browse/BEAM-4295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16528190#comment-16528190
]
Thomas Groh commented on BEAM-4295:
-----------------------------------
ValueProvider isn't really designed for this usage, as the general pattern it's
used for is late-specification, not computation or other data retrieval;
computations can be provided downstream as side inputs, and uses a relatively
simple pattern:
{code:java}
p.apply("Create Token", Create.of(0)).apply(MapElements.via(new
SimpleFunction<Integer, MyToken>() { return ...; }));
{code}
It may be worth wrapping this kind of functionality - but in general, any data
retrieval or computation should be modeled within a {{PTransform}} rather than
a {{PipelineOption}}
> Need ValueProvider that executes exactly once at pipeline runtime.
> ------------------------------------------------------------------
>
> Key: BEAM-4295
> URL: https://issues.apache.org/jira/browse/BEAM-4295
> Project: Beam
> Issue Type: Bug
> Components: runner-dataflow
> Reporter: Frank Yellin
> Assignee: Thomas Groh
> Priority: Minor
>
> When a dataflow is started from a template, the value of a ValueProvider vp
> is evaluated either (1) when the template is created or (2) the first time
> that vp.get() is called in each instantiation of that value provider.
> There needs to be a mechanism of specifying that a ValueProvider is evaluated
> once at the start of the running of the pipeline, and that the value is the
> same among all instances. I cannot find any way to do so.
> The two obvious examples I can come up with are:
>
> {code:java}
> ValueProvider<Date> startTime;
> ValueProvider<String> shortLivedAccessToken;
> {code}
> The obvious rebuttal is that the user could pass --startTimeMs or
> --shortTimeAccessToken as a parameter to the dataflow.
>
> * For the access token, the user may not have the permissions to get this
> token, and repeatedly requesting a new token is expensive and may hit system
> request limits.
> * For the "start time", the dataflow might be used to perform periodic
> maintenance in which old entries are deleted. A bad argument (accidental or
> malicious) putting startTime in the future could cause the system to think
> that *everything* is old. There is no simple mechanism to check the passed
> parameter for reasonableness.
> I can get either of these as a side input, but not as a ValueProvider.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)