Repository: incubator-airflow Updated Branches: refs/heads/master c1d02d91e -> 664c963c4
[AIRFLOW-2174] Fix typos and wrongly rendered documents Fix typos and wrongly rendered documents in the following pages: - tutorial.html#default-arguments - configuration.html#connections - code.html#airflow.operators.hive_stats_operator. HiveStatsCollectionOperator - code.html#airflow.operators.redshift_to_s3_opera tor.RedshiftToS3Transfer - code.html#airflow.contrib.operators.dataflow_ope rator.DataFlowJavaOperator - code.html#airflow.contrib.operators.dataflow_ope rator.DataflowTemplateOperator - code.html#airflow.contrib.operators.dataproc_ope rator.DataprocWorkflowTemplateInstantiateOperator - code.html#airflow.contrib.operators.mlengine_ope rator.MLEngineModelOperator - code.html#airflow.contrib.operators.mlengine_ope rator.MLEngineVersionOperator - code.html#airflow.models.DAG.following_schedule - code.html#airflow.models.DAG.previous_schedule Closes #3093 from sekikn/AIRFLOW-2174 Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/664c963c Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/664c963c Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/664c963c Branch: refs/heads/master Commit: 664c963c4c7bb86bcdb30d43d0c00c21172e08a8 Parents: c1d02d9 Author: Kengo Seki <[email protected]> Authored: Sun Mar 4 18:11:17 2018 +0100 Committer: Fokko Driesprong <[email protected]> Committed: Sun Mar 4 18:11:17 2018 +0100 ---------------------------------------------------------------------- airflow/contrib/operators/dataflow_operator.py | 79 +++++++++++---------- airflow/contrib/operators/dataproc_operator.py | 2 +- airflow/contrib/operators/mlengine_operator.py | 7 +- airflow/models.py | 2 + airflow/operators/hive_stats_operator.py | 18 +++-- airflow/operators/redshift_to_s3_operator.py | 1 + docs/configuration.rst | 4 +- docs/tutorial.rst | 2 +- 8 files changed, 61 insertions(+), 54 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/664c963c/airflow/contrib/operators/dataflow_operator.py ---------------------------------------------------------------------- diff --git a/airflow/contrib/operators/dataflow_operator.py b/airflow/contrib/operators/dataflow_operator.py index 89c005e..74e6e9e 100644 --- a/airflow/contrib/operators/dataflow_operator.py +++ b/airflow/contrib/operators/dataflow_operator.py @@ -33,32 +33,33 @@ class DataFlowJavaOperator(BaseOperator): .. code-block:: python - default_args = { - 'dataflow_default_options': { - 'project': 'my-gcp-project', - 'zone': 'europe-west1-d', - 'stagingLocation': 'gs://my-staging-bucket/staging/' - } - } + default_args = { + 'dataflow_default_options': { + 'project': 'my-gcp-project', + 'zone': 'europe-west1-d', + 'stagingLocation': 'gs://my-staging-bucket/staging/' + } + } You need to pass the path to your dataflow as a file reference with the ``jar`` - parameter, the jar needs to be a self executing jar (see documentation here: - https://beam.apache.org/documentation/runners/dataflow/#self-executing-jar). + parameter, the jar needs to be a self executing jar (see documentation here: + https://beam.apache.org/documentation/runners/dataflow/#self-executing-jar). Use ``options`` to pass on options to your job. .. code-block:: python - t1 = DataFlowOperation( - task_id='datapflow_example', - jar='{{var.value.gcp_dataflow_base}}pipeline/build/libs/pipeline-example-1.0.jar', - options={ - 'autoscalingAlgorithm': 'BASIC', - 'maxNumWorkers': '50', - 'start': '{{ds}}', - 'partitionType': 'DAY', - 'labels': {'foo' : 'bar'} - }, - gcp_conn_id='gcp-airflow-service-account', - dag=my-dag) + + t1 = DataFlowOperation( + task_id='datapflow_example', + jar='{{var.value.gcp_dataflow_base}}pipeline/build/libs/pipeline-example-1.0.jar', + options={ + 'autoscalingAlgorithm': 'BASIC', + 'maxNumWorkers': '50', + 'start': '{{ds}}', + 'partitionType': 'DAY', + 'labels': {'foo' : 'bar'} + }, + gcp_conn_id='gcp-airflow-service-account', + dag=my-dag) Both ``jar`` and ``options`` are templated so you can use variables in them. """ @@ -151,29 +152,31 @@ class DataflowTemplateOperator(BaseOperator): https://cloud.google.com/dataflow/docs/reference/rest/v1b3/RuntimeEnvironment .. code-block:: python - default_args = { - 'dataflow_default_options': { - 'project': 'my-gcp-project' - 'zone': 'europe-west1-d', - 'tempLocation': 'gs://my-staging-bucket/staging/' - } - } - } + + default_args = { + 'dataflow_default_options': { + 'project': 'my-gcp-project' + 'zone': 'europe-west1-d', + 'tempLocation': 'gs://my-staging-bucket/staging/' + } + } + } You need to pass the path to your dataflow template as a file reference with the ``template`` parameter. Use ``parameters`` to pass on parameters to your job. Use ``environment`` to pass on runtime environment variables to your job. .. code-block:: python - t1 = DataflowTemplateOperator( - task_id='datapflow_example', - template='{{var.value.gcp_dataflow_base}}', - parameters={ - 'inputFile': "gs://bucket/input/my_input.txt", - 'outputFile': "gs://bucket/output/my_output.txt" - }, - gcp_conn_id='gcp-airflow-service-account', - dag=my-dag) + + t1 = DataflowTemplateOperator( + task_id='datapflow_example', + template='{{var.value.gcp_dataflow_base}}', + parameters={ + 'inputFile': "gs://bucket/input/my_input.txt", + 'outputFile': "gs://bucket/output/my_output.txt" + }, + gcp_conn_id='gcp-airflow-service-account', + dag=my-dag) ``template``, ``dataflow_default_options`` and ``parameters`` are templated so you can use variables in them. http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/664c963c/airflow/contrib/operators/dataproc_operator.py ---------------------------------------------------------------------- diff --git a/airflow/contrib/operators/dataproc_operator.py b/airflow/contrib/operators/dataproc_operator.py index ebcc402..1de456a 100644 --- a/airflow/contrib/operators/dataproc_operator.py +++ b/airflow/contrib/operators/dataproc_operator.py @@ -983,7 +983,7 @@ class DataprocWorkflowTemplateInstantiateOperator(DataprocWorkflowTemplateBaseOp :param delegate_to: The account to impersonate, if any. For this to work, the service account making the request must have domain-wide delegation enabled. - :type delegate_to: string + :type delegate_to: string """ @apply_defaults http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/664c963c/airflow/contrib/operators/mlengine_operator.py ---------------------------------------------------------------------- diff --git a/airflow/contrib/operators/mlengine_operator.py b/airflow/contrib/operators/mlengine_operator.py index e4451ab..0d033d3 100644 --- a/airflow/contrib/operators/mlengine_operator.py +++ b/airflow/contrib/operators/mlengine_operator.py @@ -280,8 +280,9 @@ class MLEngineModelOperator(BaseOperator): :type model: dict :param operation: The operation to perform. Available operations are: - 'create': Creates a new model as provided by the `model` parameter. - 'get': Gets a particular model where the name is specified in `model`. + + * ``create``: Creates a new model as provided by the `model` parameter. + * ``get``: Gets a particular model where the name is specified in `model`. :param gcp_conn_id: The connection ID to use when fetching connection info. :type gcp_conn_id: string @@ -350,10 +351,12 @@ class MLEngineVersionOperator(BaseOperator): :type version: dict :param operation: The operation to perform. Available operations are: + * ``create``: Creates a new version in the model specified by `model_name`, in which case the `version` parameter should contain all the information to create that version (e.g. `name`, `deploymentUrl`). + * ``get``: Gets full information of a particular version in the model specified by `model_name`. The name of the version should be specified in the `version` http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/664c963c/airflow/models.py ---------------------------------------------------------------------- diff --git a/airflow/models.py b/airflow/models.py index 20996e0..ae0f68e 100755 --- a/airflow/models.py +++ b/airflow/models.py @@ -3151,6 +3151,7 @@ class DAG(BaseDag, LoggingMixin): def following_schedule(self, dttm): """ Calculates the following schedule for this dag in local time + :param dttm: utc datetime :return: utc datetime """ @@ -3165,6 +3166,7 @@ class DAG(BaseDag, LoggingMixin): def previous_schedule(self, dttm): """ Calculates the previous schedule for this dag in local time + :param dttm: utc datetime :return: utc datetime """ http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/664c963c/airflow/operators/hive_stats_operator.py ---------------------------------------------------------------------- diff --git a/airflow/operators/hive_stats_operator.py b/airflow/operators/hive_stats_operator.py index 896547e..f3923e7 100644 --- a/airflow/operators/hive_stats_operator.py +++ b/airflow/operators/hive_stats_operator.py @@ -28,16 +28,14 @@ class HiveStatsCollectionOperator(BaseOperator): """ Gathers partition statistics using a dynamically generated Presto query, inserts the stats into a MySql table with this format. Stats - overwrite themselves if you rerun the same date/partition. - - `` - CREATE TABLE hive_stats ( - ds VARCHAR(16), - table_name VARCHAR(500), - metric VARCHAR(200), - value BIGINT - ); - `` + overwrite themselves if you rerun the same date/partition. :: + + CREATE TABLE hive_stats ( + ds VARCHAR(16), + table_name VARCHAR(500), + metric VARCHAR(200), + value BIGINT + ); :param table: the source table, in the format ``database.table_name`` :type table: str http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/664c963c/airflow/operators/redshift_to_s3_operator.py ---------------------------------------------------------------------- diff --git a/airflow/operators/redshift_to_s3_operator.py b/airflow/operators/redshift_to_s3_operator.py index 5553a2a..33ea343 100644 --- a/airflow/operators/redshift_to_s3_operator.py +++ b/airflow/operators/redshift_to_s3_operator.py @@ -20,6 +20,7 @@ from airflow.utils.decorators import apply_defaults class RedshiftToS3Transfer(BaseOperator): """ Executes an UNLOAD command to s3 as a CSV with headers + :param schema: reference to a specific schema in redshift database :type schema: string :param table: reference to a specific table in redshift database http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/664c963c/docs/configuration.rst ---------------------------------------------------------------------- diff --git a/docs/configuration.rst b/docs/configuration.rst index f5e81d6..46ac615 100644 --- a/docs/configuration.rst +++ b/docs/configuration.rst @@ -101,7 +101,7 @@ connections by following steps below: 3. Replace ``airflow.cfg`` fernet_key value with the one from step 2. Alternatively, you can store your fernet_key in OS environment variable. You -do not need to change ``airflow.cfg`` in this case as AirFlow will use environment +do not need to change ``airflow.cfg`` in this case as Airflow will use environment variable over the value in ``airflow.cfg``: .. code-block:: bash @@ -109,7 +109,7 @@ variable over the value in ``airflow.cfg``: # Note the double underscores EXPORT AIRFLOW__CORE__FERNET_KEY = your_fernet_key -4. Restart AirFlow webserver. +4. Restart Airflow webserver. 5. For existing connections (the ones that you had defined before installing ``airflow[crypto]`` and creating a Fernet key), you need to open each connection in the connection admin UI, re-type the password, and save it. Connections in Airflow pipelines can be created using environment variables. http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/664c963c/docs/tutorial.rst ---------------------------------------------------------------------- diff --git a/docs/tutorial.rst b/docs/tutorial.rst index 8d2203c..df9eb6d 100644 --- a/docs/tutorial.rst +++ b/docs/tutorial.rst @@ -129,7 +129,7 @@ of default parameters that we can use when creating tasks. } For more information about the BaseOperator's parameters and what they do, -refer to the :py:class:``airflow.models.BaseOperator`` documentation. +refer to the :py:class:`airflow.models.BaseOperator` documentation. Also, note that you could easily define different sets of arguments that would serve different purposes. An example of that would be to have
