[ 
https://issues.apache.org/jira/browse/AIRFLOW-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Dolan updated AIRFLOW-1401:
---------------------------------
    Description: 
At the moment, there isn't standard usage of operator arguments for Google 
Cloud Platform across the contributions, primarily in the usage of the 
parameter meaning the GCP project name/id. This makes it difficult to specify 
default_arguments that work across all GCP-centric operators in a graph.

Using the command `grep -r project airflow/contrib/*`, we can see these uses:

project_id:
 * gcp_dataproc_hook
 * datastore_hook
 * gcp_api_base_hook
 * bigquery_hook
 * dataproc_operator
 * bigquery_sensor

project:
 * gcp_pubsub_hook (here 'project' means project id or project name, which does 
not fully understand the distinction within GCP between project id and project 
name as elements of the REST api)
 * dataflow_operator (see note below)
 * pubsub_operator

project_name:
 * gcp_cloudml_hook
 * cloudml_operator

Notably, the Dataflow Operator diverges from the pattern of using top-level 
operator parameters by specifying an options dict, which can be populated by 
the dataflow_default_options dict. This can contain 'project', and 'zone.'

Within the GCP API, there are three fields used: project number, project id, 
and project name. More details are here: 
https://cloud.google.com/resource-manager/reference/rest/v1/projects. Briefly, 
project number is an auto-assigned unique int64 assigned by GCP to identify the 
project. Project ID is a 6-30 character unique user-assigned id. Project name 
is a user-assigned display name for the project, which need to be unique, and 
cannot be used to identify the project to the service. When users think of 
their project id, name, or other identifier within the context of API calls, 
they are almost certainly thinking of the project id.

This improvement proposes to standardize the above operators (at least) on
 * project_id (meaning '<project>' in this example request: GET 
https://www.googleapis.com/compute/v1/projects/<project>/zones/<zone>/instances/<instance>)
 * region
 * zone

This can be done by changing the names of parameters of operators and hooks 
that were not included in the 1.8.1 release (cloud ml and pubsub), and by 
adding parameters to operators and hooks that were included in 1.8.1 (and 
internally copying the old parameter name to the new one, and deprecating the 
old one).

  was:
At the moment, there isn't standard usage of operator arguments for Google 
Cloud Platform across the contributions, primarily in the usage of the 
parameter meaning the GCP project name/id. This makes it difficult to specify 
default_arguments that work across all GCP-centric operators in a graph.

Using the command `grep -r project airflow/contrib/*`, we can see these uses:

project_id:
 * gcp_dataproc_hook
 * datastore_hook
 * gcp_api_base_hook
 * bigquery_hook
 * dataproc_operator
 * bigquery_sensor

project:
 * gcp_pubsub_hook (here 'project' means project id or project name, which does 
not fully understand the distinction within GCP between project id and project 
name as elements of the REST api)
 * dataflow_operator (see note below)
 * pubsub_operator

project_name:
 * gcp_cloudml_hook
 * cloudml_operator

Notably, the Dataflow Operator diverges from the pattern of using top-level 
operator parameters by specifying an options dict, which can be populated by 
the dataflow_default_options dict. This can contain 'project', and 'zone.'

This improvement proposes to standardize the above operators (at least) on
 * project_id (meaning '<project>' in this example request: GET 
https://www.googleapis.com/compute/v1/projects/<project>/zones/<zone>/instances/<instance>)
 * region
 * zone

This can be done by changing the names of parameters of operators and hooks 
that were not included in the 1.8.1 release (cloud ml and pubsub), and by 
adding parameters to operators and hooks that were included in 1.8.1 (and 
internally copying the old parameter name to the new one, and deprecating the 
old one).


> Standardize GCP project, region, and zone argument names
> --------------------------------------------------------
>
>                 Key: AIRFLOW-1401
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-1401
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: contrib
>    Affects Versions: 1.8.1
>            Reporter: Peter Dolan
>            Assignee: Peter Dolan
>
> At the moment, there isn't standard usage of operator arguments for Google 
> Cloud Platform across the contributions, primarily in the usage of the 
> parameter meaning the GCP project name/id. This makes it difficult to specify 
> default_arguments that work across all GCP-centric operators in a graph.
> Using the command `grep -r project airflow/contrib/*`, we can see these uses:
> project_id:
>  * gcp_dataproc_hook
>  * datastore_hook
>  * gcp_api_base_hook
>  * bigquery_hook
>  * dataproc_operator
>  * bigquery_sensor
> project:
>  * gcp_pubsub_hook (here 'project' means project id or project name, which 
> does not fully understand the distinction within GCP between project id and 
> project name as elements of the REST api)
>  * dataflow_operator (see note below)
>  * pubsub_operator
> project_name:
>  * gcp_cloudml_hook
>  * cloudml_operator
> Notably, the Dataflow Operator diverges from the pattern of using top-level 
> operator parameters by specifying an options dict, which can be populated by 
> the dataflow_default_options dict. This can contain 'project', and 'zone.'
> Within the GCP API, there are three fields used: project number, project id, 
> and project name. More details are here: 
> https://cloud.google.com/resource-manager/reference/rest/v1/projects. 
> Briefly, project number is an auto-assigned unique int64 assigned by GCP to 
> identify the project. Project ID is a 6-30 character unique user-assigned id. 
> Project name is a user-assigned display name for the project, which need to 
> be unique, and cannot be used to identify the project to the service. When 
> users think of their project id, name, or other identifier within the context 
> of API calls, they are almost certainly thinking of the project id.
> This improvement proposes to standardize the above operators (at least) on
>  * project_id (meaning '<project>' in this example request: GET 
> https://www.googleapis.com/compute/v1/projects/<project>/zones/<zone>/instances/<instance>)
>  * region
>  * zone
> This can be done by changing the names of parameters of operators and hooks 
> that were not included in the 1.8.1 release (cloud ml and pubsub), and by 
> adding parameters to operators and hooks that were included in 1.8.1 (and 
> internally copying the old parameter name to the new one, and deprecating the 
> old one).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to