[airflow] 07/34: Fix grammar and remove duplicate words (#14647)

potiuk Mon, 22 Mar 2021 20:25:58 -0700

This is an automated email from the ASF dual-hosted git repository.

potiuk pushed a commit to branch v2-0-test
in repository https://gitbox.apache.org/repos/asf/airflow.git


commit fefaff67b7ca0ba6199cff9dfeba0198261c4cc3
Author: John Bampton <[email protected]>
AuthorDate: Sun Mar 7 20:28:54 2021 +1000

    Fix grammar and remove duplicate words (#14647)
    
    * chore: fix grammar and remove duplicate words
    
    (cherry picked from commit 6dc24c95e3bb46ac42fc80b1948aa79ae6c6fbd1)
---
 .github/workflows/build-images-workflow-run.yml                   | 2 +-
 BREEZE.rst                                                        | 2 +-
 IMAGES.rst                                                        | 2 +-
 PULL_REQUEST_WORKFLOW.rst                                         | 4 ++--
 airflow/jobs/scheduler_job.py                                     | 2 +-
 airflow/models/dag.py                                             | 2 +-
 airflow/models/dagrun.py                                          | 2 +-
 .../providers/apache/hive/example_dags/example_twitter_README.md  | 2 +-
 airflow/providers/apache/hive/example_dags/example_twitter_dag.py | 2 +-
 .../example_dags/example_cloud_storage_transfer_service_aws.py    | 2 +-
 .../example_dags/example_cloud_storage_transfer_service_gcp.py    | 2 +-
 airflow/providers/google/cloud/operators/dataflow.py              | 8 ++++----
 airflow/providers/google/cloud/operators/dataproc.py              | 2 +-
 airflow/providers/google/suite/hooks/sheets.py                    | 2 +-
 airflow/providers/google/suite/transfers/gcs_to_gdrive.py         | 2 +-
 airflow/www/templates/airflow/graph.html                          | 2 +-
 breeze                                                            | 2 +-
 chart/values.yaml                                                 | 4 ++--
 dev/provider_packages/prepare_provider_packages.py                | 2 +-
 .../operators/cloud/kubernetes_engine.rst                         | 2 +-
 docs/apache-airflow/dag-run.rst                                   | 2 +-
 docs/apache-airflow/production-deployment.rst                     | 2 +-
 docs/apache-airflow/upgrading-to-2.rst                            | 4 ++--
 23 files changed, 29 insertions(+), 29 deletions(-)

diff --git a/.github/workflows/build-images-workflow-run.yml 
b/.github/workflows/build-images-workflow-run.yml
index e5f8b41..33c7a79 100644
--- a/.github/workflows/build-images-workflow-run.yml
+++ b/.github/workflows/build-images-workflow-run.yml
@@ -556,5 +556,5 @@ jobs:
           cancelMode: self
           notifyPRCancel: true
           notifyPRCancelMessage: |
-            Building images for the PR has failed. Follow the the workflow 
link to check the reason.
+            Building images for the PR has failed. Follow the workflow link to 
check the reason.
           sourceRunId: ${{ github.event.workflow_run.id }}
diff --git a/BREEZE.rst b/BREEZE.rst
index 4f78977..874cb46 100644
--- a/BREEZE.rst
+++ b/BREEZE.rst
@@ -2283,7 +2283,7 @@ This is the current syntax for  `./breeze <./breeze>`_:
                  update-breeze-file update-extras update-local-yml-file 
update-setup-cfg-file
                  version-sync yamllint
 
-        You can pass extra arguments including options to to the pre-commit 
framework as
+        You can pass extra arguments including options to the pre-commit 
framework as
         <EXTRA_ARGS> passed after --. For example:
 
         'breeze static-check mypy' or
diff --git a/IMAGES.rst b/IMAGES.rst
index 3011b28..2871ba0 100644
--- a/IMAGES.rst
+++ b/IMAGES.rst
@@ -436,7 +436,7 @@ Customizing the image
 
 Customizing the image is an alternative way of adding your own dependencies to 
the image.
 
-The easiest way to build the image image is to use ``breeze`` script, but you 
can also build such customized
+The easiest way to build the image is to use ``breeze`` script, but you can 
also build such customized
 image by running appropriately crafted docker build in which you specify all 
the ``build-args``
 that you need to add to customize it. You can read about all the args and ways 
you can build the image
 in the `<#ci-image-build-arguments>`_ chapter below.
diff --git a/PULL_REQUEST_WORKFLOW.rst b/PULL_REQUEST_WORKFLOW.rst
index 39ef618..719e8c5 100644
--- a/PULL_REQUEST_WORKFLOW.rst
+++ b/PULL_REQUEST_WORKFLOW.rst
@@ -237,7 +237,7 @@ As explained above the approval and matrix tests workflow 
works according to the
     :align: center
     :alt: Full tests are needed for the PR
 
-4) If this or another committer "request changes" in in a  previously approved 
PR with "full tests needed"
+4) If this or another committer "request changes" in a previously approved PR 
with "full tests needed"
    label, the bot automatically removes the label, moving it back to "run only 
default set of parameters"
    mode. For PRs touching core of airflow once the PR gets approved back, the 
label will be restored.
    If it was manually set by the committer, it has to be restored manually.
@@ -248,7 +248,7 @@ As explained above the approval and matrix tests workflow 
works according to the
       for the PRs and they provide good "notification" for the committer to 
act on a PR that was recently
       approved.
 
-The PR approval workflow is possible thanks two two custom GitHub Actions 
we've developed:
+The PR approval workflow is possible thanks to two custom GitHub Actions we've 
developed:
 
 * `Get workflow origin <https://github.com/potiuk/get-workflow-origin/>`_
 * `Label when approved <https://github.com/TobKed/label-when-approved-action>`_
diff --git a/airflow/jobs/scheduler_job.py b/airflow/jobs/scheduler_job.py
index ae91b0d..3970df9 100644
--- a/airflow/jobs/scheduler_job.py
+++ b/airflow/jobs/scheduler_job.py
@@ -1459,7 +1459,7 @@ class SchedulerJob(BaseJob):  # pylint: 
disable=too-many-instance-attributes
           By "next oldest", we mean hasn't been examined/scheduled in the most 
time.
 
           The reason we don't select all dagruns at once because the rows are 
selected with row locks, meaning
-          that only one scheduler can "process them", even it it is waiting 
behind other dags. Increasing this
+          that only one scheduler can "process them", even it is waiting 
behind other dags. Increasing this
           limit will allow more throughput for smaller DAGs but will likely 
slow down throughput for larger
           (>500 tasks.) DAGs
 
diff --git a/airflow/models/dag.py b/airflow/models/dag.py
index 47fc34b..0db5609 100644
--- a/airflow/models/dag.py
+++ b/airflow/models/dag.py
@@ -1087,7 +1087,7 @@ class DAG(LoggingMixin):
             # using the items() method for iterating, a copy of the
             # unsorted graph is used, allowing us to modify the unsorted
             # graph as we move through it. We also keep a flag for
-            # checking that that graph is acyclic, which is true if any
+            # checking that graph is acyclic, which is true if any
             # nodes are resolved during each pass through the graph. If
             # not, we need to exit as the graph therefore can't be
             # sorted.
diff --git a/airflow/models/dagrun.py b/airflow/models/dagrun.py
index fae58e1..674d4df 100644
--- a/airflow/models/dagrun.py
+++ b/airflow/models/dagrun.py
@@ -576,7 +576,7 @@ class DagRun(Base, LoggingMixin):
         started task within the DAG and calculate the expected DagRun start 
time (based on
         dag.execution_date & dag.schedule_interval), and minus these two 
values to get the delay.
         The emitted data may contains outlier (e.g. when the first task was 
cleared, so
-        the second task's start_date will be used), but we can get rid of the 
the outliers
+        the second task's start_date will be used), but we can get rid of the 
outliers
         on the stats side through the dashboards tooling built.
         Note, the stat will only be emitted if the DagRun is a scheduler 
triggered one
         (i.e. external_trigger is False).
diff --git 
a/airflow/providers/apache/hive/example_dags/example_twitter_README.md 
b/airflow/providers/apache/hive/example_dags/example_twitter_README.md
index ff68856..c22ca2c 100644
--- a/airflow/providers/apache/hive/example_dags/example_twitter_README.md
+++ b/airflow/providers/apache/hive/example_dags/example_twitter_README.md
@@ -50,7 +50,7 @@ CREATE TABLE toTwitter_A(id BIGINT, id_str STRING
                          alter table toTwitter_A SET serdeproperties 
('skip.header.line.count' = '1');
 ```
 
-When you review the code for the DAG, you will notice that these tasks are 
generated using for loop. These two for loops could be combined into one loop. 
However, in most cases, you will be running different analysis on your incoming 
incoming and outgoing tweets, and hence they are kept separated in this example.
+When you review the code for the DAG, you will notice that these tasks are 
generated using for loop. These two for loops could be combined into one loop. 
However, in most cases, you will be running different analysis on your incoming 
and outgoing tweets, and hence they are kept separated in this example.
 Final step is a running the broker script, brokerapi.py, which will run 
queries in Hive and store the summarized data to MySQL in our case. To connect 
to Hive, pyhs2 library is extremely useful and easy to use. To insert data into 
MySQL from Python, sqlalchemy is also a good one to use.
 I hope you find this tutorial useful. If you have question feel free to ask me 
on [Twitter](https://twitter.com/EkhtiarSyed).<p>
 -Ekhtiar Syed
diff --git a/airflow/providers/apache/hive/example_dags/example_twitter_dag.py 
b/airflow/providers/apache/hive/example_dags/example_twitter_dag.py
index 8c9d1f3..b336d6f 100644
--- a/airflow/providers/apache/hive/example_dags/example_twitter_dag.py
+++ b/airflow/providers/apache/hive/example_dags/example_twitter_dag.py
@@ -132,7 +132,7 @@ with DAG(
     # The following tasks are generated using for loop. The first task puts 
the eight
     # csv files to HDFS. The second task loads these files from HDFS to 
respected Hive
     # tables. These two for loops could be combined into one loop. However, in 
most cases,
-    # you will be running different analysis on your incoming incoming and 
outgoing tweets,
+    # you will be running different analysis on your incoming and outgoing 
tweets,
     # and hence they are kept separated in this example.
     # 
--------------------------------------------------------------------------------
 
diff --git 
a/airflow/providers/google/cloud/example_dags/example_cloud_storage_transfer_service_aws.py
 
b/airflow/providers/google/cloud/example_dags/example_cloud_storage_transfer_service_aws.py
index 353aa33..c1bc8c0 100644
--- 
a/airflow/providers/google/cloud/example_dags/example_cloud_storage_transfer_service_aws.py
+++ 
b/airflow/providers/google/cloud/example_dags/example_cloud_storage_transfer_service_aws.py
@@ -28,7 +28,7 @@ This DAG relies on the following OS environment variables
   .. warning::
     You need to provide a large enough set of data so that operations do not 
execute too quickly.
     Otherwise, DAG will fail.
-* GCP_TRANSFER_SECOND_TARGET_BUCKET - Google Cloud Storage bucket bucket to 
which files are copied
+* GCP_TRANSFER_SECOND_TARGET_BUCKET - Google Cloud Storage bucket to which 
files are copied
 * WAIT_FOR_OPERATION_POKE_INTERVAL - interval of what to check the status of 
the operation
   A smaller value than the default value accelerates the system test and 
ensures its correct execution with
   smaller quantities of files in the source bucket
diff --git 
a/airflow/providers/google/cloud/example_dags/example_cloud_storage_transfer_service_gcp.py
 
b/airflow/providers/google/cloud/example_dags/example_cloud_storage_transfer_service_gcp.py
index c4cfa2e..8e851df 100644
--- 
a/airflow/providers/google/cloud/example_dags/example_cloud_storage_transfer_service_gcp.py
+++ 
b/airflow/providers/google/cloud/example_dags/example_cloud_storage_transfer_service_gcp.py
@@ -25,7 +25,7 @@ This DAG relies on the following OS environment variables
 * GCP_PROJECT_ID - Google Cloud Project to use for the Google Cloud Transfer 
Service.
 * GCP_TRANSFER_FIRST_TARGET_BUCKET - Google Cloud Storage bucket to which 
files are copied from AWS.
   It is also a source bucket in next step
-* GCP_TRANSFER_SECOND_TARGET_BUCKET - Google Cloud Storage bucket bucket to 
which files are copied
+* GCP_TRANSFER_SECOND_TARGET_BUCKET - Google Cloud Storage bucket to which 
files are copied
 """
 
 import os
diff --git a/airflow/providers/google/cloud/operators/dataflow.py 
b/airflow/providers/google/cloud/operators/dataflow.py
index f977704..92ae77e 100644
--- a/airflow/providers/google/cloud/operators/dataflow.py
+++ b/airflow/providers/google/cloud/operators/dataflow.py
@@ -84,7 +84,7 @@ class DataflowConfiguration:
         account from the list granting this role to the originating account 
(templated).
     :type impersonation_chain: Union[str, Sequence[str]]
     :param drain_pipeline: Optional, set to True if want to stop streaming job 
by draining it
-        instead of canceling during during killing task instance. See:
+        instead of canceling during killing task instance. See:
         https://cloud.google.com/dataflow/docs/guides/stopping-a-pipeline
     :type drain_pipeline: bool
     :param cancel_timeout: How long (in seconds) operator should wait for the 
pipeline to be
@@ -717,7 +717,7 @@ class DataflowStartFlexTemplateOperator(BaseOperator):
         domain-wide delegation enabled.
     :type delegate_to: str
     :param drain_pipeline: Optional, set to True if want to stop streaming job 
by draining it
-        instead of canceling during during killing task instance. See:
+        instead of canceling during killing task instance. See:
         https://cloud.google.com/dataflow/docs/guides/stopping-a-pipeline
     :type drain_pipeline: bool
     :param cancel_timeout: How long (in seconds) operator should wait for the 
pipeline to be
@@ -843,7 +843,7 @@ class DataflowStartSqlJobOperator(BaseOperator):
         domain-wide delegation enabled.
     :type delegate_to: str
     :param drain_pipeline: Optional, set to True if want to stop streaming job 
by draining it
-        instead of canceling during during killing task instance. See:
+        instead of canceling during killing task instance. See:
         https://cloud.google.com/dataflow/docs/guides/stopping-a-pipeline
     :type drain_pipeline: bool
     """
@@ -982,7 +982,7 @@ class DataflowCreatePythonJobOperator(BaseOperator):
         JOB_STATE_RUNNING state.
     :type poll_sleep: int
     :param drain_pipeline: Optional, set to True if want to stop streaming job 
by draining it
-        instead of canceling during during killing task instance. See:
+        instead of canceling during killing task instance. See:
         https://cloud.google.com/dataflow/docs/guides/stopping-a-pipeline
     :type drain_pipeline: bool
     :param cancel_timeout: How long (in seconds) operator should wait for the 
pipeline to be
diff --git a/airflow/providers/google/cloud/operators/dataproc.py 
b/airflow/providers/google/cloud/operators/dataproc.py
index 13b7026..7843164 100644
--- a/airflow/providers/google/cloud/operators/dataproc.py
+++ b/airflow/providers/google/cloud/operators/dataproc.py
@@ -610,7 +610,7 @@ class DataprocCreateClusterOperator(BaseOperator):
         # Check if cluster is not in ERROR state
         self._handle_error_state(hook, cluster)
         if cluster.status.state == cluster.status.State.CREATING:
-            # Wait for cluster to be be created
+            # Wait for cluster to be created
             cluster = self._wait_for_cluster_in_creating_state(hook)
             self._handle_error_state(hook, cluster)
         elif cluster.status.state == cluster.status.State.DELETING:
diff --git a/airflow/providers/google/suite/hooks/sheets.py 
b/airflow/providers/google/suite/hooks/sheets.py
index 3e4b62f..2c57231 100644
--- a/airflow/providers/google/suite/hooks/sheets.py
+++ b/airflow/providers/google/suite/hooks/sheets.py
@@ -271,7 +271,7 @@ class GSheetsHook(GoogleBaseHook):
         """
         if len(ranges) != len(values):
             raise AirflowException(
-                "'Ranges' and and 'Lists' must be of equal length. \n \
+                "'Ranges' and 'Lists' must be of equal length. \n \
                 'Ranges' is of length: {} and \n \
                 'Values' is of length: {}.".format(
                     str(len(ranges)), str(len(values))
diff --git a/airflow/providers/google/suite/transfers/gcs_to_gdrive.py 
b/airflow/providers/google/suite/transfers/gcs_to_gdrive.py
index 7427c36..06419b9 100644
--- a/airflow/providers/google/suite/transfers/gcs_to_gdrive.py
+++ b/airflow/providers/google/suite/transfers/gcs_to_gdrive.py
@@ -30,7 +30,7 @@ WILDCARD = "*"
 
 class GCSToGoogleDriveOperator(BaseOperator):
     """
-    Copies objects from a Google Cloud Storage service service to Google Drive 
service, with renaming
+    Copies objects from a Google Cloud Storage service to a Google Drive 
service, with renaming
     if requested.
 
     Using this operator requires the following OAuth 2.0 scope:
diff --git a/airflow/www/templates/airflow/graph.html 
b/airflow/www/templates/airflow/graph.html
index 807cef1..44b0e01 100644
--- a/airflow/www/templates/airflow/graph.html
+++ b/airflow/www/templates/airflow/graph.html
@@ -673,7 +673,7 @@
             // Is there a better way to get node_width and node_height ?
             const [node_width, node_height] = 
[rect[0][0].attributes.width.value, rect[0][0].attributes.height.value];
 
-            // Calculate zoom scale to fill most of the canvas with the the 
node/cluster in focus.
+            // Calculate zoom scale to fill most of the canvas with the 
node/cluster in focus.
             const scale = Math.min(
               Math.min(width / node_width, height / node_height),
               1.5,  // cap zoom level to 1.5 so nodes are not too large
diff --git a/breeze b/breeze
index fd8cfc2..6c352c9 100755
--- a/breeze
+++ b/breeze
@@ -2012,7 +2012,7 @@ ${CMDNAME} static-check [FLAGS] static_check [-- 
<EXTRA_ARGS>]
 
 ${FORMATTED_STATIC_CHECKS}
 
-      You can pass extra arguments including options to to the pre-commit 
framework as
+      You can pass extra arguments including options to the pre-commit 
framework as
       <EXTRA_ARGS> passed after --. For example:
 
       '${CMDNAME} static-check mypy' or
diff --git a/chart/values.yaml b/chart/values.yaml
index 30ff4dc..cbced4f 100644
--- a/chart/values.yaml
+++ b/chart/values.yaml
@@ -166,7 +166,7 @@ secret: []
 # Extra secrets that will be managed by the chart
 # (You can use them with extraEnv or extraEnvFrom or some of the extraVolumes 
values).
 # The format is "key/value" where
-#    * key (can be templated) is the the name the secret that will be created
+#    * key (can be templated) is the name of the secret that will be created
 #    * value: an object with the standard 'data' or 'stringData' key (or both).
 #          The value associated with those keys must be a string (can be 
templated)
 extraSecrets: {}
@@ -185,7 +185,7 @@ extraSecrets: {}
 # Extra ConfigMaps that will be managed by the chart
 # (You can use them with extraEnv or extraEnvFrom or some of the extraVolumes 
values).
 # The format is "key/value" where
-#    * key (can be templated) is the the name the configmap that will be 
created
+#    * key (can be templated) is the name of the configmap that will be created
 #    * value: an object with the standard 'data' key.
 #          The value associated with this keys must be a string (can be 
templated)
 extraConfigMaps: {}
diff --git a/dev/provider_packages/prepare_provider_packages.py 
b/dev/provider_packages/prepare_provider_packages.py
index 49408d8..0f8a65c 100755
--- a/dev/provider_packages/prepare_provider_packages.py
+++ b/dev/provider_packages/prepare_provider_packages.py
@@ -971,7 +971,7 @@ def make_sure_remote_apache_exists_and_fetch():
     Make sure that apache remote exist in git. We need to take a log from the 
apache
     repository - not locally.
 
-    Also the the local repo might be shallow so we need to unshallow it.
+    Also the local repo might be shallow so we need to unshallow it.
 
     This will:
     * check if the remote exists and add if it does not
diff --git 
a/docs/apache-airflow-providers-google/operators/cloud/kubernetes_engine.rst 
b/docs/apache-airflow-providers-google/operators/cloud/kubernetes_engine.rst
index 5860679..10cbc3e 100644
--- a/docs/apache-airflow-providers-google/operators/cloud/kubernetes_engine.rst
+++ b/docs/apache-airflow-providers-google/operators/cloud/kubernetes_engine.rst
@@ -36,7 +36,7 @@ Prerequisite Tasks
 Manage GKE cluster
 ^^^^^^^^^^^^^^^^^^
 
-A cluster is the foundation of GKE - all workloads run on on top of the 
cluster. It is made up on a cluster master
+A cluster is the foundation of GKE - all workloads run on top of the cluster. 
It is made up on a cluster master
 and worker nodes. The lifecycle of the master is managed by GKE when creating 
or deleting a cluster.
 The worker nodes are represented as Compute Engine VM instances that GKE 
creates on your behalf when creating a cluster.
 
diff --git a/docs/apache-airflow/dag-run.rst b/docs/apache-airflow/dag-run.rst
index 5fc2426..72204f1 100644
--- a/docs/apache-airflow/dag-run.rst
+++ b/docs/apache-airflow/dag-run.rst
@@ -22,7 +22,7 @@ A DAG Run is an object representing an instantiation of the 
DAG in time.
 Each DAG may or may not have a schedule, which informs how DAG Runs are
 created. ``schedule_interval`` is defined as a DAG argument, which can be 
passed a
 `cron expression <https://en.wikipedia.org/wiki/Cron#CRON_expression>`_ as
-a ``str``, a ``datetime.timedelta`` object, or one of of the following cron 
"presets".
+a ``str``, a ``datetime.timedelta`` object, or one of the following cron 
"presets".
 
 .. tip::
     You can use an online editor for CRON expressions such as `Crontab guru 
<https://crontab.guru/>`_
diff --git a/docs/apache-airflow/production-deployment.rst 
b/docs/apache-airflow/production-deployment.rst
index 042b655..1565fa8e 100644
--- a/docs/apache-airflow/production-deployment.rst
+++ b/docs/apache-airflow/production-deployment.rst
@@ -230,7 +230,7 @@ dependencies that are not needed in the final image. You 
need to use Airflow Sou
 from the `official distribution folder of Apache Airflow 
<https://downloads.apache.org/airflow/>`_ for the
 released versions, or checked out from the GitHub project if you happen to do 
it from git sources.
 
-The easiest way to build the image image is to use ``breeze`` script, but you 
can also build such customized
+The easiest way to build the image is to use ``breeze`` script, but you can 
also build such customized
 image by running appropriately crafted docker build in which you specify all 
the ``build-args``
 that you need to add to customize it. You can read about all the args and ways 
you can build the image
 in the `<#production-image-build-arguments>`_ chapter below.
diff --git a/docs/apache-airflow/upgrading-to-2.rst 
b/docs/apache-airflow/upgrading-to-2.rst
index 876d2cd..13e3cec 100644
--- a/docs/apache-airflow/upgrading-to-2.rst
+++ b/docs/apache-airflow/upgrading-to-2.rst
@@ -299,7 +299,7 @@ When DAGs are initialized with the ``access_control`` 
variable set, any usage of
     If you previously used non-RBAC UI, you have to switch to the new RBAC-UI 
and create users to be able
     to access Airflow's webserver. For more details on CLI to create users see 
:doc:`cli-and-env-variables-ref`
 
-Please note that that custom auth backends will need re-writing to target new 
FAB based UI.
+Please note that custom auth backends will need re-writing to target new FAB 
based UI.
 
 As part of this change, a few configuration items in ``[webserver]`` section 
are removed and no longer applicable,
 including ``authenticate``, ``filter_by_owner``, ``owner_mode``, and ``rbac``.
@@ -1110,7 +1110,7 @@ and there is no need for it to be accessible from the CLI 
interface.
 
 If the DAGRun was triggered with conf key/values passed in, they will also be 
printed in the dag_state CLI response
 ie. running, {"name": "bob"}
-whereas in in prior releases it just printed the state:
+whereas in prior releases it just printed the state:
 ie. running
 
 **Deprecating ignore_first_depends_on_past on backfill command and default it 
to True**

[airflow] 07/34: Fix grammar and remove duplicate words (#14647)

Reply via email to