[ 
https://issues.apache.org/jira/browse/BEAM-4295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16528190#comment-16528190
 ] 

Thomas Groh commented on BEAM-4295:
-----------------------------------

ValueProvider isn't really designed for this usage, as the general pattern it's 
used for is late-specification, not computation or other data retrieval; 
computations can be provided downstream as side inputs, and uses a relatively 
simple pattern: 
{code:java}
p.apply("Create Token", Create.of(0)).apply(MapElements.via(new 
SimpleFunction<Integer, MyToken>() { return ...; }));
{code}

It may be worth wrapping this kind of functionality - but in general, any data 
retrieval or computation should be modeled within a {{PTransform}} rather than 
a {{PipelineOption}}

> Need ValueProvider that executes exactly once at pipeline runtime.
> ------------------------------------------------------------------
>
>                 Key: BEAM-4295
>                 URL: https://issues.apache.org/jira/browse/BEAM-4295
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-dataflow
>            Reporter: Frank Yellin
>            Assignee: Thomas Groh
>            Priority: Minor
>
> When a dataflow is started from a template, the value of a ValueProvider vp 
> is evaluated either (1) when the template is created or (2) the first time 
> that vp.get() is called in each instantiation of that value provider.
> There needs to be a mechanism of specifying that a ValueProvider is evaluated 
> once at the start of the running of the pipeline, and that the value is the 
> same among all instances.  I cannot find any way to do so.
> The two obvious examples I can come up with are:
>  
> {code:java}
>       ValueProvider<Date> startTime;
>       ValueProvider<String> shortLivedAccessToken;
> {code}
> The obvious rebuttal is that the user could pass --startTimeMs or 
> --shortTimeAccessToken as a parameter to the dataflow. 
>  
>  * For the access token, the user may not have the permissions to get this 
> token, and repeatedly requesting a new token is expensive and may hit system 
> request limits.
>  * For the "start time", the dataflow might be used to perform periodic 
> maintenance in which old entries are deleted.  A bad argument (accidental or 
> malicious) putting startTime in the future could cause the system to think 
> that *everything* is old.  There is no simple mechanism to check the passed 
> parameter for reasonableness.
> I can get either of these as a side input, but not as a ValueProvider.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to