This is an automated email from the ASF dual-hosted git repository.
taragolis pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/airflow.git
The following commit(s) were added to refs/heads/main by this push:
new 05945a47f3 add doc about Yandex Query operator (#39445)
05945a47f3 is described below
commit 05945a47f32571422fec82559cbde366f255b8ed
Author: uzhastik <[email protected]>
AuthorDate: Thu May 9 01:20:41 2024 +0300
add doc about Yandex Query operator (#39445)
* add yq sample
* fix style and links to doc from provider.yaml
* fix style again
* more links
* reorg operators doc
* fix links to how-to-guide
---
airflow/providers/yandex/provider.yaml | 4 +-
docs/apache-airflow-providers-yandex/index.rst | 2 +-
.../{operators.rst => operators/dataproc.rst} | 0
.../operators/index.rst | 28 ++++++++++++
.../{operators.rst => operators/yq.rst} | 23 +++-------
.../system/providers/yandex/example_yandexcloud.py | 4 +-
.../yandex/example_yandexcloud_dataproc.py | 4 +-
.../example_yandexcloud_dataproc_lightweight.py | 4 +-
...oc_lightweight.py => example_yandexcloud_yq.py} | 51 +++++-----------------
9 files changed, 54 insertions(+), 66 deletions(-)
diff --git a/airflow/providers/yandex/provider.yaml
b/airflow/providers/yandex/provider.yaml
index de488215a0..4d8b6d6c9b 100644
--- a/airflow/providers/yandex/provider.yaml
+++ b/airflow/providers/yandex/provider.yaml
@@ -62,14 +62,14 @@ integrations:
- integration-name: Yandex.Cloud Dataproc
external-doc-url: https://cloud.yandex.com/dataproc
how-to-guide:
- - /docs/apache-airflow-providers-yandex/operators.rst
+ - /docs/apache-airflow-providers-yandex/operators/dataproc.rst
logo: /integration-logos/yandex/Yandex-Cloud.png
tags: [service]
- integration-name: Yandex.Cloud YQ
external-doc-url: https://cloud.yandex.com/en/services/query
how-to-guide:
- - /docs/apache-airflow-providers-yandex/operators.rst
+ - /docs/apache-airflow-providers-yandex/operators/yq.rst
logo: /integration-logos/yandex/Yandex-Cloud.png
tags: [service]
diff --git a/docs/apache-airflow-providers-yandex/index.rst
b/docs/apache-airflow-providers-yandex/index.rst
index d5eb31c75a..4fb3ed732e 100644
--- a/docs/apache-airflow-providers-yandex/index.rst
+++ b/docs/apache-airflow-providers-yandex/index.rst
@@ -37,7 +37,7 @@
Configuration <configurations-ref>
Connection types <connections/yandexcloud>
Lockbox Secret Backend
<secrets-backends/yandex-cloud-lockbox-secret-backend>
- Operators <operators>
+ Operators <operators/index>
.. toctree::
:hidden:
diff --git a/docs/apache-airflow-providers-yandex/operators.rst
b/docs/apache-airflow-providers-yandex/operators/dataproc.rst
similarity index 100%
copy from docs/apache-airflow-providers-yandex/operators.rst
copy to docs/apache-airflow-providers-yandex/operators/dataproc.rst
diff --git a/docs/apache-airflow-providers-yandex/operators/index.rst
b/docs/apache-airflow-providers-yandex/operators/index.rst
new file mode 100644
index 0000000000..12b05418e1
--- /dev/null
+++ b/docs/apache-airflow-providers-yandex/operators/index.rst
@@ -0,0 +1,28 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+ .. http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+
+
+
+Yandex.Cloud Operators
+======================
+
+
+.. toctree::
+ :maxdepth: 1
+ :glob:
+
+ *
diff --git a/docs/apache-airflow-providers-yandex/operators.rst
b/docs/apache-airflow-providers-yandex/operators/yq.rst
similarity index 50%
rename from docs/apache-airflow-providers-yandex/operators.rst
rename to docs/apache-airflow-providers-yandex/operators/yq.rst
index 2bb08d859d..78bdb733ee 100644
--- a/docs/apache-airflow-providers-yandex/operators.rst
+++ b/docs/apache-airflow-providers-yandex/operators/yq.rst
@@ -16,22 +16,13 @@
under the License.
-Yandex.Cloud Data Proc Operators
-================================
-
-`Yandex Data Proc <https://cloud.yandex.com/services/data-proc>`__ is a service
-that helps you deploy Apache Hadoop®* and Apache Spark™ clusters in the Yandex
Cloud infrastructure.
-
-With Data Proc, you can manage the cluster size and node capacity,
-as well as work with various Apache® services,
-such as Spark, HDFS, YARN, Hive, HBase, Oozie, Sqoop, Flume, Tez, and Zeppelin.
-
-Apache Hadoop is used for storing and analyzing structured and unstructured
big data.
-
-Apache Spark is a tool for quick data processing
-that can be integrated with Apache Hadoop and other storage systems.
+Yandex Query Operators
+======================
+`Yandex Query <https://yandex.cloud/en/services/query>`__ is a service in the
Yandex Cloud to process data from different sources such as
+`Object Storage <https://yandex.cloud/ru/services/storage>`__, `MDB ClickHouse
<https://yandex.cloud/ru/services/managed-clickhouse>`__,
+`MDB PostgreSQL <https://yandex.cloud/ru/services/managed-postgresql>`__,
`Yandex DataStreams <https://yandex.cloud/ru/services/data-streams>`__ using
SQL scripts.
Using the operators
^^^^^^^^^^^^^^^^^^^
-To learn how to use Data Proc operators,
-see `example DAGs
<https://github.com/apache/airflow/tree/providers-yandex/|version|/tests/system/providers/yandex/example_yandexcloud_dataproc.py>`_.
+To learn how to use Yandex Query operator,
+see `example DAG
<https://github.com/apache/airflow/tree/providers-yandex/|version|/tests/system/providers/yandex/example_yandexcloud_yq.py>`__.
diff --git a/tests/system/providers/yandex/example_yandexcloud.py
b/tests/system/providers/yandex/example_yandexcloud.py
index c48b58c264..2751458ae9 100644
--- a/tests/system/providers/yandex/example_yandexcloud.py
+++ b/tests/system/providers/yandex/example_yandexcloud.py
@@ -16,7 +16,6 @@
# under the License.
from __future__ import annotations
-import os
from datetime import datetime
import yandex.cloud.dataproc.v1.cluster_pb2 as cluster_pb
@@ -32,8 +31,9 @@ from google.protobuf.json_format import MessageToDict
from airflow import DAG
from airflow.decorators import task
from airflow.providers.yandex.hooks.yandex import YandexCloudBaseHook
+from tests.system.utils import get_test_env_id
-ENV_ID = os.environ.get("SYSTEM_TESTS_ENV_ID")
+ENV_ID = get_test_env_id()
DAG_ID = "example_yandexcloud_hook"
# Fill it with your identifiers
diff --git a/tests/system/providers/yandex/example_yandexcloud_dataproc.py
b/tests/system/providers/yandex/example_yandexcloud_dataproc.py
index a08e60daa8..cfae4e94e0 100644
--- a/tests/system/providers/yandex/example_yandexcloud_dataproc.py
+++ b/tests/system/providers/yandex/example_yandexcloud_dataproc.py
@@ -16,7 +16,6 @@
# under the License.
from __future__ import annotations
-import os
import uuid
from datetime import datetime
@@ -32,6 +31,7 @@ from airflow.providers.yandex.operators.yandexcloud_dataproc
import (
# Name of the datacenter where Dataproc cluster will be created
from airflow.utils.trigger_rule import TriggerRule
+from tests.system.utils import get_test_env_id
# should be filled with appropriate ids
@@ -41,7 +41,7 @@ AVAILABILITY_ZONE_ID = "ru-central1-c"
# Dataproc cluster jobs will produce logs in specified s3 bucket
S3_BUCKET_NAME_FOR_JOB_LOGS = ""
-ENV_ID = os.environ.get("SYSTEM_TESTS_ENV_ID")
+ENV_ID = get_test_env_id()
DAG_ID = "example_yandexcloud_dataproc_operator"
with DAG(
diff --git
a/tests/system/providers/yandex/example_yandexcloud_dataproc_lightweight.py
b/tests/system/providers/yandex/example_yandexcloud_dataproc_lightweight.py
index 930d6bfc9d..fa5a3c758b 100644
--- a/tests/system/providers/yandex/example_yandexcloud_dataproc_lightweight.py
+++ b/tests/system/providers/yandex/example_yandexcloud_dataproc_lightweight.py
@@ -16,7 +16,6 @@
# under the License.
from __future__ import annotations
-import os
from datetime import datetime
from airflow import DAG
@@ -28,6 +27,7 @@ from airflow.providers.yandex.operators.yandexcloud_dataproc
import (
# Name of the datacenter where Dataproc cluster will be created
from airflow.utils.trigger_rule import TriggerRule
+from tests.system.utils import get_test_env_id
# should be filled with appropriate ids
@@ -37,7 +37,7 @@ AVAILABILITY_ZONE_ID = "ru-central1-c"
# Dataproc cluster will use this bucket as distributed storage
S3_BUCKET_NAME = ""
-ENV_ID = os.environ.get("SYSTEM_TESTS_ENV_ID")
+ENV_ID = get_test_env_id()
DAG_ID = "example_yandexcloud_dataproc_lightweight"
with DAG(
diff --git
a/tests/system/providers/yandex/example_yandexcloud_dataproc_lightweight.py
b/tests/system/providers/yandex/example_yandexcloud_yq.py
similarity index 50%
copy from
tests/system/providers/yandex/example_yandexcloud_dataproc_lightweight.py
copy to tests/system/providers/yandex/example_yandexcloud_yq.py
index 930d6bfc9d..0ebef685e2 100644
--- a/tests/system/providers/yandex/example_yandexcloud_dataproc_lightweight.py
+++ b/tests/system/providers/yandex/example_yandexcloud_yq.py
@@ -16,29 +16,15 @@
# under the License.
from __future__ import annotations
-import os
from datetime import datetime
-from airflow import DAG
-from airflow.providers.yandex.operators.yandexcloud_dataproc import (
- DataprocCreateClusterOperator,
- DataprocCreateSparkJobOperator,
- DataprocDeleteClusterOperator,
-)
+from airflow.models.dag import DAG
+from airflow.operators.empty import EmptyOperator
+from airflow.providers.yandex.operators.yq import YQExecuteQueryOperator
+from tests.system.utils import get_test_env_id
-# Name of the datacenter where Dataproc cluster will be created
-from airflow.utils.trigger_rule import TriggerRule
-
-# should be filled with appropriate ids
-
-
-AVAILABILITY_ZONE_ID = "ru-central1-c"
-
-# Dataproc cluster will use this bucket as distributed storage
-S3_BUCKET_NAME = ""
-
-ENV_ID = os.environ.get("SYSTEM_TESTS_ENV_ID")
-DAG_ID = "example_yandexcloud_dataproc_lightweight"
+ENV_ID = get_test_env_id()
+DAG_ID = "example_yandexcloud_yq"
with DAG(
DAG_ID,
@@ -46,29 +32,12 @@ with DAG(
start_date=datetime(2021, 1, 1),
tags=["example"],
) as dag:
- create_cluster = DataprocCreateClusterOperator(
- task_id="create_cluster",
- zone=AVAILABILITY_ZONE_ID,
- s3_bucket=S3_BUCKET_NAME,
- computenode_count=1,
- datanode_count=0,
- services=("SPARK", "YARN"),
- )
-
- create_spark_job = DataprocCreateSparkJobOperator(
- cluster_id=create_cluster.cluster_id,
- task_id="create_spark_job",
-
main_jar_file_uri="file:///usr/lib/spark/examples/jars/spark-examples.jar",
- main_class="org.apache.spark.examples.SparkPi",
- args=["1000"],
+ run_this_last = EmptyOperator(
+ task_id="run_this_last",
)
- delete_cluster = DataprocDeleteClusterOperator(
- cluster_id=create_cluster.cluster_id,
- task_id="delete_cluster",
- trigger_rule=TriggerRule.ALL_DONE,
- )
- create_spark_job >> delete_cluster
+ yq_operator = YQExecuteQueryOperator(task_id="sample_query", sql="select
33 as d, 44 as t")
+ yq_operator >> run_this_last
from tests.system.utils.watcher import watcher