[ 
https://issues.apache.org/jira/browse/AIRFLOW-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16731890#comment-16731890
 ] 

Siro commented on AIRFLOW-2009:
-------------------------------

I encountered this too.

How did you work around this? Did you change the hook code manually?

> DataFlowHook does not use correct service account
> -------------------------------------------------
>
>                 Key: AIRFLOW-2009
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-2009
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: Dataflow, hooks
>    Affects Versions: 2.0.0
>            Reporter: Jessica Laughlin
>            Priority: Major
>
> We have been using the DataFlowOperator to schedule DataFlow jobs.
> We found that the DataFlowHook used by the DataFlowOperator doesn't actually 
> use the passed `gcp_conn_id` to schedule the DataFlow job, but only to read 
> the results after. 
> code 
> (https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/hooks/gcp_dataflow_hook.py#L158):
>         _Dataflow(cmd).wait_for_done()
>         _DataflowJob(self.get_conn(), variables['project'],
>                      name, self.poll_sleep).wait_for_done()
> The first line here should also be using self.get_conn(). 
> For this reason, our tasks using the DataFlowOperator have actually been 
> using the default Google Compute Engine service account (which has DataFlow 
> permissions) to schedule DataFlow jobs. It is only when our provided service 
> account (which does not have DataFlow permissions) is used in the second line 
> that we are seeing a permissions error. 
> I would like to fix this bug, but have to work around it at the moment due to 
> time constraints. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to