[
https://issues.apache.org/jira/browse/AIRFLOW-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kaxil Naik resolved AIRFLOW-2009.
---------------------------------
Resolution: Fixed
Fix Version/s: 1.10.3
> DataFlowHook does not use correct service account
> -------------------------------------------------
>
> Key: AIRFLOW-2009
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2009
> Project: Apache Airflow
> Issue Type: Bug
> Components: Dataflow, hooks
> Affects Versions: 2.0.0
> Reporter: Jessica Laughlin
> Assignee: Feng Lu
> Priority: Major
> Fix For: 1.10.3
>
>
> We have been using the DataFlowOperator to schedule DataFlow jobs.
> We found that the DataFlowHook used by the DataFlowOperator doesn't actually
> use the passed `gcp_conn_id` to schedule the DataFlow job, but only to read
> the results after.
> code
> (https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/hooks/gcp_dataflow_hook.py#L158):
> _Dataflow(cmd).wait_for_done()
> _DataflowJob(self.get_conn(), variables['project'],
> name, self.poll_sleep).wait_for_done()
> The first line here should also be using self.get_conn().
> For this reason, our tasks using the DataFlowOperator have actually been
> using the default Google Compute Engine service account (which has DataFlow
> permissions) to schedule DataFlow jobs. It is only when our provided service
> account (which does not have DataFlow permissions) is used in the second line
> that we are seeing a permissions error.
> I would like to fix this bug, but have to work around it at the moment due to
> time constraints.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)