[ https://issues.apache.org/jira/browse/AIRFLOW-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16082942#comment-16082942 ]
Wilson Lian commented on AIRFLOW-1401: -------------------------------------- Thanks for writing this up, Peter. This looks good to me. > Standardize GCP project, region, and zone argument names > -------------------------------------------------------- > > Key: AIRFLOW-1401 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1401 > Project: Apache Airflow > Issue Type: Improvement > Components: contrib > Affects Versions: 1.8.1 > Reporter: Peter Dolan > Assignee: Peter Dolan > > At the moment, there isn't standard usage of operator arguments for Google > Cloud Platform across the contributions, primarily in the usage of the > parameter meaning the GCP project name/id. This makes it difficult to specify > default_arguments that work across all GCP-centric operators in a graph. > Using the command `grep -r project airflow/contrib/*`, we can see these uses: > project_id: > * gcp_dataproc_hook > * datastore_hook > * gcp_api_base_hook > * bigquery_hook > * dataproc_operator > * bigquery_sensor > project: > * gcp_pubsub_hook (here 'project' means project id or project name, which > does not fully understand the distinction within GCP between project id and > project name as elements of the REST api) > * dataflow_operator (see note below) > * pubsub_operator > project_name: > * gcp_cloudml_hook > * cloudml_operator > Notably, the Dataflow Operator diverges from the pattern of using top-level > operator parameters by specifying an options dict, which can be populated by > the dataflow_default_options dict. This can contain 'project', and 'zone.' > Within the GCP API, there are three fields used: project number, project id, > and project name. More details are here: > https://cloud.google.com/resource-manager/reference/rest/v1/projects. > Briefly, project number is an auto-assigned unique int64 assigned by GCP to > identify the project. Project ID is a 6-30 character unique user-assigned id. > Project name is a user-assigned display name for the project, which need to > be unique, and cannot be used to identify the project to the service. When > users think of their project id, name, or other identifier within the context > of API calls, they are almost certainly thinking of the project id. > This improvement proposes to standardize the above operators (at least) on > * project_id (meaning '<project>' in this example request: GET > https://www.googleapis.com/compute/v1/projects/<project>/zones/<zone>/instances/<instance>) > * region > * zone > This can be done by changing the names of parameters of operators and hooks > that were not included in the 1.8.1 release (cloud ml and pubsub), and by > adding parameters to operators and hooks that were included in 1.8.1 (and > internally copying the old parameter name to the new one, and deprecating the > old one). -- This message was sent by Atlassian JIRA (v6.4.14#64029)