[
https://issues.apache.org/jira/browse/AIRFLOW-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16082777#comment-16082777
]
Peter Dolan commented on AIRFLOW-1401:
--------------------------------------
Sounds great, thanks for the pointer [~fenglu]. I'll include those changes in
any PR I prepare.
> Standardize GCP project, region, and zone argument names
> --------------------------------------------------------
>
> Key: AIRFLOW-1401
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1401
> Project: Apache Airflow
> Issue Type: Improvement
> Components: contrib
> Affects Versions: 1.8.1
> Reporter: Peter Dolan
> Assignee: Peter Dolan
>
> At the moment, there isn't standard usage of operator arguments for Google
> Cloud Platform across the contributions, primarily in the usage of the
> parameter meaning the GCP project name/id. This makes it difficult to specify
> default_arguments that work across all GCP-centric operators in a graph.
> Using the command `grep -r project airflow/contrib/*`, we can see these uses:
> project_id:
> * gcp_dataproc_hook
> * datastore_hook
> * gcp_api_base_hook
> * bigquery_hook
> * dataproc_operator
> * bigquery_sensor
> project:
> * gcp_pubsub_hook (here 'project' means project id or project name, which
> does not fully understand the distinction within GCP between project id and
> project name as elements of the REST api)
> * dataflow_operator (see note below)
> * pubsub_operator
> project_name:
> * gcp_cloudml_hook
> * cloudml_operator
> Notably, the Dataflow Operator diverges from the pattern of using top-level
> operator parameters by specifying an options dict, which can be populated by
> the dataflow_default_options dict. This can contain 'project', and 'zone.'
> Within the GCP API, there are three fields used: project number, project id,
> and project name. More details are here:
> https://cloud.google.com/resource-manager/reference/rest/v1/projects.
> Briefly, project number is an auto-assigned unique int64 assigned by GCP to
> identify the project. Project ID is a 6-30 character unique user-assigned id.
> Project name is a user-assigned display name for the project, which need to
> be unique, and cannot be used to identify the project to the service. When
> users think of their project id, name, or other identifier within the context
> of API calls, they are almost certainly thinking of the project id.
> This improvement proposes to standardize the above operators (at least) on
> * project_id (meaning '<project>' in this example request: GET
> https://www.googleapis.com/compute/v1/projects/<project>/zones/<zone>/instances/<instance>)
> * region
> * zone
> This can be done by changing the names of parameters of operators and hooks
> that were not included in the 1.8.1 release (cloud ml and pubsub), and by
> adding parameters to operators and hooks that were included in 1.8.1 (and
> internally copying the old parameter name to the new one, and deprecating the
> old one).
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)