[ 
https://issues.apache.org/jira/browse/AIRFLOW-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16082942#comment-16082942
 ] 

Wilson Lian commented on AIRFLOW-1401:
--------------------------------------

Thanks for writing this up, Peter. This looks good to me.

> Standardize GCP project, region, and zone argument names
> --------------------------------------------------------
>
>                 Key: AIRFLOW-1401
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-1401
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: contrib
>    Affects Versions: 1.8.1
>            Reporter: Peter Dolan
>            Assignee: Peter Dolan
>
> At the moment, there isn't standard usage of operator arguments for Google 
> Cloud Platform across the contributions, primarily in the usage of the 
> parameter meaning the GCP project name/id. This makes it difficult to specify 
> default_arguments that work across all GCP-centric operators in a graph.
> Using the command `grep -r project airflow/contrib/*`, we can see these uses:
> project_id:
>  * gcp_dataproc_hook
>  * datastore_hook
>  * gcp_api_base_hook
>  * bigquery_hook
>  * dataproc_operator
>  * bigquery_sensor
> project:
>  * gcp_pubsub_hook (here 'project' means project id or project name, which 
> does not fully understand the distinction within GCP between project id and 
> project name as elements of the REST api)
>  * dataflow_operator (see note below)
>  * pubsub_operator
> project_name:
>  * gcp_cloudml_hook
>  * cloudml_operator
> Notably, the Dataflow Operator diverges from the pattern of using top-level 
> operator parameters by specifying an options dict, which can be populated by 
> the dataflow_default_options dict. This can contain 'project', and 'zone.'
> Within the GCP API, there are three fields used: project number, project id, 
> and project name. More details are here: 
> https://cloud.google.com/resource-manager/reference/rest/v1/projects. 
> Briefly, project number is an auto-assigned unique int64 assigned by GCP to 
> identify the project. Project ID is a 6-30 character unique user-assigned id. 
> Project name is a user-assigned display name for the project, which need to 
> be unique, and cannot be used to identify the project to the service. When 
> users think of their project id, name, or other identifier within the context 
> of API calls, they are almost certainly thinking of the project id.
> This improvement proposes to standardize the above operators (at least) on
>  * project_id (meaning '<project>' in this example request: GET 
> https://www.googleapis.com/compute/v1/projects/<project>/zones/<zone>/instances/<instance>)
>  * region
>  * zone
> This can be done by changing the names of parameters of operators and hooks 
> that were not included in the 1.8.1 release (cloud ml and pubsub), and by 
> adding parameters to operators and hooks that were included in 1.8.1 (and 
> internally copying the old parameter name to the new one, and deprecating the 
> old one).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to