[ 
https://issues.apache.org/jira/browse/BEAM-7983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7983:
-------------------------------
    Status: Open  (was: Triage Needed)

> Template parameters don't work if they are only used in DoFns
> -------------------------------------------------------------
>
>                 Key: BEAM-7983
>                 URL: https://issues.apache.org/jira/browse/BEAM-7983
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-java-core
>            Reporter: Yunqing Zhou
>            Assignee: Luke Cwik
>            Priority: Minor
>
> Template parameters don't work if they are only used in DoFns but not 
> anywhere else in main.
> Sample pipeline:
>  
> {code:java}
> import org.apache.beam.sdk.Pipeline;
> import org.apache.beam.sdk.options.PipelineOptions;
> import org.apache.beam.sdk.options.PipelineOptionsFactory;
> import org.apache.beam.sdk.options.ValueProvider;
> import org.apache.beam.sdk.transforms.Create;
> import org.apache.beam.sdk.transforms.DoFn;
> import org.apache.beam.sdk.transforms.ParDo;
> public class BugPipeline {
>   public interface Options extends PipelineOptions {
>     ValueProvider<String> getFoo();
>     void setFoo(ValueProvider<String> foo);
>   }
>   public static void main(String[] args) throws Exception {
>     Options options = PipelineOptionsFactory.fromArgs(args).as(Options.class);
>     Pipeline p = Pipeline.create(options);
>     p.apply(Create.of(1)).apply(ParDo.of(new DoFn<Integer, String>() {
>       @ProcessElement
>       public void processElement(ProcessContext context) {
>         
> System.out.println(context.getPipelineOptions().as(Options.class).getFoo());
>       }   
>     }));
>     p.run();                                                                  
>                                                                               
>                                                                               
>                                                                               
>       
>   }
> }
> {code}
> Option "foo" is not used anywhere else than the DoFn. So to reproduce the 
> problem:
> {code:bash}
> $java BugPipeline --project=$PROJECT --stagingLocation=$STAGING 
> --templateLocation=$TEMPLATE --runner=DataflowRunner
> $gcloud dataflow jobs run $NAME --gcs-location=$TEMPLATE --parameters=foo=bar
> {code}
> it will fail w/ this error:
> {code}
> ERROR: (gcloud.dataflow.jobs.run) INVALID_ARGUMENT: (2621bec26c2488b7): The 
> workflow could not be created. Causes: (2621bec26c248dba): Found unexpected 
> parameters: ['foo' (perhaps you meant 'zone')]
> - '@type': type.googleapis.com/google.rpc.DebugInfo
>   detail: "(2621bec26c2488b7): The workflow could not be created. Causes: 
> (2621bec26c248dba):\
>     \ Found unexpected parameters: ['foo' (perhaps you meant 'zone')]"
> {code}
> The underlying problem is that ProxyInvocationHandler.java only populate 
> options which are "invoked" to the pipeline option map in the job object:
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L159
> One way to solve it is to save all ValueProvider type of params in the 
> pipelineoptions section. Alternatively, some registration mechanism can be 
> introduced.
> A current workaround is to annotate the parameter with 
> {code}@Validation.Required{code}, which will call invoke() behind the scene.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to