Frank Yellin created BEAM-4295:
----------------------------------
Summary: Need ValueProvider that executes exactly once at pipeline
runtime.
Key: BEAM-4295
URL: https://issues.apache.org/jira/browse/BEAM-4295
Project: Beam
Issue Type: Bug
Components: runner-dataflow
Reporter: Frank Yellin
Assignee: Thomas Groh
When a dataflow is started from a template, the value of a ValueProvider vp is
evaluated either (1) when the template is created or (2) the first time that
vp.get() is called in each instantiation of that value provider.
There needs to be a mechanism of specifying that a ValueProvider is evaluated
once at the start of the running of the pipeline, and that the value is the
same among all instances. I cannot find any way to do so.
The two obvious examples I can come up with are:
{code:java}
ValueProvider<Date> startTime;
ValueProvider<String> shortLivedAccessToken;
{code}
The obvious rebuttal is that the user could pass --startTimeMs or
--shortTimeAccessToken as a parameter to the dataflow.
* For the access token, the user may not have the permissions to get this
token, and repeatedly requesting a new token is expensive and may hit system
request limits.
* For the "start time", the dataflow might be used to perform periodic
maintenance in which old entries are deleted. A bad argument (accidental or
malicious) putting startTime in the future could cause the system to think that
*everything* is old. There is no simple mechanism to check the passed
parameter for reasonableness.
I can get either of these as a side input, but not as a ValueProvider.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)