This is an automated email from the ASF dual-hosted git repository.
hansva pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hop.git
The following commit(s) were added to refs/heads/master by this push:
new 1a470698a0 extended documentation on running workflows and pipelines
in Apache Airflow. fixes #2777
new 244beda36d Merge pull request #2926 from bamaer/2777
1a470698a0 is described below
commit 1a470698a04f63b909a75fcb8f224afec691746f
Author: Bart Maertens <[email protected]>
AuthorDate: Sat May 13 08:54:03 2023 +0200
extended documentation on running workflows and pipelines in Apache
Airflow. fixes #2777
---
.../samples/pipelines/pipeline-with-parameter.hpl | 178 ++++++++++++
.../apache-airflow/docker-compose.yaml | 298 +++++++++++++++++++++
.../run-hop-in-apache-airflow/airflow-dag-run.png | Bin 16819 -> 0 bytes
.../run-hop-in-apache-airflow/airflow-logo.svg | 119 ++++++++
.../apache-airflow-dag-available.png | Bin 0 -> 98334 bytes
.../apache-airflow-dag-error.png | Bin 0 -> 68432 bytes
.../apache-airflow-dag-graph.png | Bin 0 -> 164591 bytes
.../apache-airflow-dag-logs.png | Bin 0 -> 706386 bytes
.../apache-airflow-dag-run.png | Bin 0 -> 32151 bytes
.../apache-airflow-dag-runs.png | Bin 0 -> 49947 bytes
.../apache-airflow-empty-server.png | Bin 0 -> 86076 bytes
.../apache-airflow-pipeline-with-parameter.png | Bin 0 -> 44049 bytes
.../apache-airflow-run-config (1).png | Bin 0 -> 14934 bytes
.../apache-airflow-run-config.png | Bin 0 -> 21926 bytes
.../apache-airflow-trigger-dag-with-config.png | Bin 0 -> 12574 bytes
.../apache-airflow-trigger.png | Bin 0 -> 15005 bytes
.../apache-airflow-two-dags.png | Bin 0 -> 32222 bytes
.../how-to-guides/run-hop-in-apache-airflow.adoc | 293 ++++++++++++++++++--
18 files changed, 867 insertions(+), 21 deletions(-)
diff --git
a/assemblies/static/src/main/resources/config/projects/samples/pipelines/pipeline-with-parameter.hpl
b/assemblies/static/src/main/resources/config/projects/samples/pipelines/pipeline-with-parameter.hpl
new file mode 100644
index 0000000000..7fcee49119
--- /dev/null
+++
b/assemblies/static/src/main/resources/config/projects/samples/pipelines/pipeline-with-parameter.hpl
@@ -0,0 +1,178 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to You under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+
+-->
+<pipeline>
+ <info>
+ <name>pipeline-with-parameter</name>
+ <name_sync_with_filename>Y</name_sync_with_filename>
+ <description/>
+ <extended_description/>
+ <pipeline_version/>
+ <pipeline_type>Normal</pipeline_type>
+ <pipeline_status>0</pipeline_status>
+ <parameters>
+ <parameter>
+ <name>PRM_EXAMPLE</name>
+ <default_value/>
+ <description/>
+ </parameter>
+ </parameters>
+ <capture_transform_performance>N</capture_transform_performance>
+
<transform_performance_capturing_delay>1000</transform_performance_capturing_delay>
+
<transform_performance_capturing_size_limit>100</transform_performance_capturing_size_limit>
+ <created_user>-</created_user>
+ <created_date>2023/05/08 08:19:58.557</created_date>
+ <modified_user>-</modified_user>
+ <modified_date>2023/05/08 08:19:58.557</modified_date>
+ </info>
+ <notepads>
+ </notepads>
+ <order>
+ <hop>
+ <from>get ${PRM_EXAMPLE}</from>
+ <to>write parameter to log</to>
+ <enabled>Y</enabled>
+ </hop>
+ <hop>
+ <from>write parameter to log</from>
+ <to>get ${ENV_VARIABLE}</to>
+ <enabled>Y</enabled>
+ </hop>
+ <hop>
+ <from>get ${ENV_VARIABLE}</from>
+ <to>write env_variable to log</to>
+ <enabled>Y</enabled>
+ </hop>
+ </order>
+ <transform>
+ <name>get ${PRM_EXAMPLE}</name>
+ <type>GetVariable</type>
+ <description/>
+ <distribute>Y</distribute>
+ <custom_distribution/>
+ <copies>1</copies>
+ <partitioning>
+ <method>none</method>
+ <schema_name/>
+ </partitioning>
+ <fields>
+ <field>
+ <length>-1</length>
+ <name>example</name>
+ <precision>-1</precision>
+ <trim_type>none</trim_type>
+ <type>String</type>
+ <variable>${PRM_EXAMPLE}</variable>
+ </field>
+ </fields>
+ <attributes/>
+ <GUI>
+ <xloc>192</xloc>
+ <yloc>96</yloc>
+ </GUI>
+ </transform>
+ <transform>
+ <name>write parameter to log</name>
+ <type>WriteToLog</type>
+ <description/>
+ <distribute>Y</distribute>
+ <custom_distribution/>
+ <copies>1</copies>
+ <partitioning>
+ <method>none</method>
+ <schema_name/>
+ </partitioning>
+ <loglevel>log_level_basic</loglevel>
+ <displayHeader>Y</displayHeader>
+ <limitRows>N</limitRows>
+ <limitRowsNumber>0</limitRowsNumber>
+ <logmessage>we received '${PRM_EXAMPLE}' as the value for this pipeline's
parameter.</logmessage>
+ <fields>
+ <field>
+ <name>example</name>
+ </field>
+ </fields>
+ <attributes/>
+ <GUI>
+ <xloc>384</xloc>
+ <yloc>96</yloc>
+ </GUI>
+ </transform>
+ <transform>
+ <name>get ${ENV_VARIABLE}</name>
+ <type>GetVariable</type>
+ <description/>
+ <distribute>Y</distribute>
+ <custom_distribution/>
+ <copies>1</copies>
+ <partitioning>
+ <method>none</method>
+ <schema_name/>
+ </partitioning>
+ <fields>
+ <field>
+ <currency/>
+ <decimal/>
+ <format/>
+ <group/>
+ <length>-1</length>
+ <name>env_variable</name>
+ <precision>-1</precision>
+ <trim_type>none</trim_type>
+ <type>String</type>
+ <variable>${ENV_VARIABLE}</variable>
+ </field>
+ </fields>
+ <attributes/>
+ <GUI>
+ <xloc>384</xloc>
+ <yloc>208</yloc>
+ </GUI>
+ </transform>
+ <transform>
+ <name>write env_variable to log</name>
+ <type>WriteToLog</type>
+ <description/>
+ <distribute>Y</distribute>
+ <custom_distribution/>
+ <copies>1</copies>
+ <partitioning>
+ <method>none</method>
+ <schema_name/>
+ </partitioning>
+ <loglevel>log_level_basic</loglevel>
+ <displayHeader>Y</displayHeader>
+ <limitRows>N</limitRows>
+ <limitRowsNumber>0</limitRowsNumber>
+ <logmessage>the value for variable ENV_VARIABLE is
${ENV_VARIABLE}</logmessage>
+ <fields>
+ <field>
+ <name>env_variable</name>
+ </field>
+ </fields>
+ <attributes/>
+ <GUI>
+ <xloc>592</xloc>
+ <yloc>208</yloc>
+ </GUI>
+ </transform>
+ <transform_error_handling>
+ </transform_error_handling>
+ <attributes/>
+</pipeline>
diff --git
a/docs/hop-user-manual/modules/ROOT/assets/files/how-to-guides/apache-airflow/docker-compose.yaml
b/docs/hop-user-manual/modules/ROOT/assets/files/how-to-guides/apache-airflow/docker-compose.yaml
new file mode 100644
index 0000000000..6538da8265
--- /dev/null
+++
b/docs/hop-user-manual/modules/ROOT/assets/files/how-to-guides/apache-airflow/docker-compose.yaml
@@ -0,0 +1,298 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+# Basic Airflow cluster configuration for CeleryExecutor with Redis and
PostgreSQL.
+#
+# WARNING: This configuration is for local development. Do not use it in a
production deployment.
+#
+# This configuration supports basic configuration using environment variables
or an .env file
+# The following variables are supported:
+#
+# AIRFLOW_IMAGE_NAME - Docker image name used to run Airflow.
+# Default: apache/airflow:2.6.0
+# AIRFLOW_UID - User ID in Airflow containers
+# Default: 50000
+# AIRFLOW_PROJ_DIR - Base path to which all the files will be
volumed.
+# Default: .
+# Those configurations are useful mostly in case of standalone testing/running
Airflow in test/try-out mode
+#
+# _AIRFLOW_WWW_USER_USERNAME - Username for the administrator account (if
requested).
+# Default: airflow
+# _AIRFLOW_WWW_USER_PASSWORD - Password for the administrator account (if
requested).
+# Default: airflow
+# _PIP_ADDITIONAL_REQUIREMENTS - Additional PIP requirements to add when
starting all containers.
+# Use this option ONLY for quick checks.
Installing requirements at container
+# startup is done EVERY TIME the service is
started.
+# A better way is to build a custom image or
extend the official image
+# as described in
https://airflow.apache.org/docs/docker-stack/build.html.
+# Default: ''
+#
+# Feel free to modify this file to suit your needs.
+---
+version: '3.8'
+x-airflow-common:
+ &airflow-common
+ # In order to add custom dependencies or upgrade provider packages you can
use your extended image.
+ # Comment the image line, place your Dockerfile in the directory where you
placed the docker-compose.yaml
+ # and uncomment the "build" line below, Then run `docker-compose build` to
build the images.
+ image: ${AIRFLOW_IMAGE_NAME:-apache/airflow:2.6.0}
+ # build: .
+ environment:
+ &airflow-common-env
+ AIRFLOW__CORE__EXECUTOR: CeleryExecutor
+ AIRFLOW__DATABASE__SQL_ALCHEMY_CONN:
postgresql+psycopg2://airflow:airflow@postgres/airflow
+ # For backward compatibility, with Airflow <2.3
+ AIRFLOW__CORE__SQL_ALCHEMY_CONN:
postgresql+psycopg2://airflow:airflow@postgres/airflow
+ AIRFLOW__CELERY__RESULT_BACKEND:
db+postgresql://airflow:airflow@postgres/airflow
+ AIRFLOW__CELERY__BROKER_URL: redis://:@redis:6379/0
+ AIRFLOW__CORE__FERNET_KEY: ''
+ AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true'
+ AIRFLOW__CORE__LOAD_EXAMPLES: 'false'
+ AIRFLOW__API__AUTH_BACKENDS:
'airflow.api.auth.backend.basic_auth,airflow.api.auth.backend.session'
+ # yamllint disable rule:line-length
+ # Use simple http server on scheduler for health checks
+ # See
https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/logging-monitoring/check-health.html#scheduler-health-check-server
+ # yamllint enable rule:line-length
+ AIRFLOW__SCHEDULER__ENABLE_HEALTH_CHECK: 'true'
+ # WARNING: Use _PIP_ADDITIONAL_REQUIREMENTS option ONLY for a quick checks
+ # for other purpose (development, test and especially production usage)
build/extend Airflow image.
+ _PIP_ADDITIONAL_REQUIREMENTS: ${_PIP_ADDITIONAL_REQUIREMENTS:-}
+ volumes:
+ - ${AIRFLOW_PROJ_DIR:-.}/dags:/opt/airflow/dags
+ - ${AIRFLOW_PROJ_DIR:-.}/logs:/opt/airflow/logs
+ - ${AIRFLOW_PROJ_DIR:-.}/plugins:/opt/airflow/plugins
+ user: "${AIRFLOW_UID:-50000}:0"
+ depends_on:
+ &airflow-common-depends-on
+ redis:
+ condition: service_healthy
+ postgres:
+ condition: service_healthy
+
+services:
+ postgres:
+ image: postgres:13
+ environment:
+ POSTGRES_USER: airflow
+ POSTGRES_PASSWORD: airflow
+ POSTGRES_DB: airflow
+ volumes:
+ - postgres-db-volume:/var/lib/postgresql/data
+ healthcheck:
+ test: ["CMD", "pg_isready", "-U", "airflow"]
+ interval: 10s
+ retries: 5
+ start_period: 5s
+ restart: always
+
+ redis:
+ image: redis:latest
+ expose:
+ - 6379
+ healthcheck:
+ test: ["CMD", "redis-cli", "ping"]
+ interval: 10s
+ timeout: 30s
+ retries: 50
+ start_period: 30s
+ restart: always
+
+ airflow-webserver:
+ <<: *airflow-common
+ command: webserver
+ ports:
+ - "8080:8080"
+ healthcheck:
+ test: ["CMD", "curl", "--fail", "http://localhost:8080/health"]
+ interval: 30s
+ timeout: 10s
+ retries: 5
+ start_period: 30s
+ restart: always
+ depends_on:
+ <<: *airflow-common-depends-on
+ airflow-init:
+ condition: service_completed_successfully
+
+ airflow-scheduler:
+ <<: *airflow-common
+ command: scheduler
+ healthcheck:
+ test: ["CMD", "curl", "--fail", "http://localhost:8974/health"]
+ interval: 30s
+ timeout: 10s
+ retries: 5
+ start_period: 30s
+ restart: always
+ depends_on:
+ <<: *airflow-common-depends-on
+ airflow-init:
+ condition: service_completed_successfully
+
+ airflow-worker:
+ <<: *airflow-common
+ command: celery worker
+ healthcheck:
+ test:
+ - "CMD-SHELL"
+ - 'celery --app airflow.executors.celery_executor.app inspect ping -d
"celery@$${HOSTNAME}"'
+ interval: 30s
+ timeout: 10s
+ retries: 5
+ start_period: 30s
+ environment:
+ <<: *airflow-common-env
+ # Required to handle warm shutdown of the celery workers properly
+ # See
https://airflow.apache.org/docs/docker-stack/entrypoint.html#signal-propagation
+ DUMB_INIT_SETSID: "0"
+ restart: always
+ depends_on:
+ <<: *airflow-common-depends-on
+ airflow-init:
+ condition: service_completed_successfully
+
+ airflow-triggerer:
+ <<: *airflow-common
+ command: triggerer
+ healthcheck:
+ test: ["CMD-SHELL", 'airflow jobs check --job-type TriggererJob
--hostname "$${HOSTNAME}"']
+ interval: 30s
+ timeout: 10s
+ retries: 5
+ start_period: 30s
+ restart: always
+ depends_on:
+ <<: *airflow-common-depends-on
+ airflow-init:
+ condition: service_completed_successfully
+
+ airflow-init:
+ <<: *airflow-common
+ entrypoint: /bin/bash
+ # yamllint disable rule:line-length
+ command:
+ - -c
+ - |
+ function ver() {
+ printf "%04d%04d%04d%04d" $${1//./ }
+ }
+ airflow_version=$$(AIRFLOW__LOGGING__LOGGING_LEVEL=INFO && gosu
airflow airflow version)
+ airflow_version_comparable=$$(ver $${airflow_version})
+ min_airflow_version=2.2.0
+ min_airflow_version_comparable=$$(ver $${min_airflow_version})
+ if (( airflow_version_comparable < min_airflow_version_comparable ));
then
+ echo
+ echo -e "\033[1;31mERROR!!!: Too old Airflow version
$${airflow_version}!\e[0m"
+ echo "The minimum Airflow version supported:
$${min_airflow_version}. Only use this or higher!"
+ echo
+ exit 1
+ fi
+ if [[ -z "${AIRFLOW_UID}" ]]; then
+ echo
+ echo -e "\033[1;33mWARNING!!!: AIRFLOW_UID not set!\e[0m"
+ echo "If you are on Linux, you SHOULD follow the instructions below
to set "
+ echo "AIRFLOW_UID environment variable, otherwise files will be
owned by root."
+ echo "For other operating systems you can get rid of the warning
with manually created .env file:"
+ echo " See:
https://airflow.apache.org/docs/apache-airflow/stable/howto/docker-compose/index.html#setting-the-right-airflow-user"
+ echo
+ fi
+ one_meg=1048576
+ mem_available=$$(($$(getconf _PHYS_PAGES) * $$(getconf PAGE_SIZE) /
one_meg))
+ cpus_available=$$(grep -cE 'cpu[0-9]+' /proc/stat)
+ disk_available=$$(df / | tail -1 | awk '{print $$4}')
+ warning_resources="false"
+ if (( mem_available < 4000 )) ; then
+ echo
+ echo -e "\033[1;33mWARNING!!!: Not enough memory available for
Docker.\e[0m"
+ echo "At least 4GB of memory required. You have $$(numfmt --to iec
$$((mem_available * one_meg)))"
+ echo
+ warning_resources="true"
+ fi
+ if (( cpus_available < 2 )); then
+ echo
+ echo -e "\033[1;33mWARNING!!!: Not enough CPUS available for
Docker.\e[0m"
+ echo "At least 2 CPUs recommended. You have $${cpus_available}"
+ echo
+ warning_resources="true"
+ fi
+ if (( disk_available < one_meg * 10 )); then
+ echo
+ echo -e "\033[1;33mWARNING!!!: Not enough Disk space available for
Docker.\e[0m"
+ echo "At least 10 GBs recommended. You have $$(numfmt --to iec
$$((disk_available * 1024 )))"
+ echo
+ warning_resources="true"
+ fi
+ if [[ $${warning_resources} == "true" ]]; then
+ echo
+ echo -e "\033[1;33mWARNING!!!: You have not enough resources to run
Airflow (see above)!\e[0m"
+ echo "Please follow the instructions to increase amount of resources
available:"
+ echo "
https://airflow.apache.org/docs/apache-airflow/stable/howto/docker-compose/index.html#before-you-begin"
+ echo
+ fi
+ mkdir -p /sources/logs /sources/dags /sources/plugins
+ chown -R "${AIRFLOW_UID}:0" /sources/{logs,dags,plugins}
+ exec /entrypoint airflow version
+ # yamllint enable rule:line-length
+ environment:
+ <<: *airflow-common-env
+ _AIRFLOW_DB_UPGRADE: 'true'
+ _AIRFLOW_WWW_USER_CREATE: 'true'
+ _AIRFLOW_WWW_USER_USERNAME: ${_AIRFLOW_WWW_USER_USERNAME:-airflow}
+ _AIRFLOW_WWW_USER_PASSWORD: ${_AIRFLOW_WWW_USER_PASSWORD:-airflow}
+ _PIP_ADDITIONAL_REQUIREMENTS: ''
+ user: "0:0"
+ volumes:
+ - ${AIRFLOW_PROJ_DIR:-.}:/sources
+
+ airflow-cli:
+ <<: *airflow-common
+ profiles:
+ - debug
+ environment:
+ <<: *airflow-common-env
+ CONNECTION_CHECK_MAX_COUNT: "0"
+ # Workaround for entrypoint issue. See:
https://github.com/apache/airflow/issues/16252
+ command:
+ - bash
+ - -c
+ - airflow
+
+ # You can enable flower by adding "--profile flower" option e.g.
docker-compose --profile flower up
+ # or by explicitly targeted on the command line e.g. docker-compose up
flower.
+ # See: https://docs.docker.com/compose/profiles/
+ flower:
+ <<: *airflow-common
+ command: celery flower
+ profiles:
+ - flower
+ ports:
+ - "5555:5555"
+ healthcheck:
+ test: ["CMD", "curl", "--fail", "http://localhost:5555/"]
+ interval: 30s
+ timeout: 10s
+ retries: 5
+ start_period: 30s
+ restart: always
+ depends_on:
+ <<: *airflow-common-depends-on
+ airflow-init:
+ condition: service_completed_successfully
+
+volumes:
+ postgres-db-volume:
diff --git
a/docs/hop-user-manual/modules/ROOT/assets/images/how-to-guides/run-hop-in-apache-airflow/airflow-dag-run.png
b/docs/hop-user-manual/modules/ROOT/assets/images/how-to-guides/run-hop-in-apache-airflow/airflow-dag-run.png
deleted file mode 100644
index 6799ded7c6..0000000000
Binary files
a/docs/hop-user-manual/modules/ROOT/assets/images/how-to-guides/run-hop-in-apache-airflow/airflow-dag-run.png
and /dev/null differ
diff --git
a/docs/hop-user-manual/modules/ROOT/assets/images/how-to-guides/run-hop-in-apache-airflow/airflow-logo.svg
b/docs/hop-user-manual/modules/ROOT/assets/images/how-to-guides/run-hop-in-apache-airflow/airflow-logo.svg
new file mode 100644
index 0000000000..1a2fb29f60
--- /dev/null
+++
b/docs/hop-user-manual/modules/ROOT/assets/images/how-to-guides/run-hop-in-apache-airflow/airflow-logo.svg
@@ -0,0 +1,119 @@
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<svg
+ width="58.872005"
+ height="58.869011"
+ viewBox="0 0 58.872007 58.869011"
+ version="1.1"
+ id="svg142"
+ sodipodi:docname="airflow-logo.svg"
+ inkscape:version="1.2.2 (b0a8486541, 2022-12-01)"
+ xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
+ xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
+ xmlns="http://www.w3.org/2000/svg"
+ xmlns:svg="http://www.w3.org/2000/svg">
+ <sodipodi:namedview
+ id="namedview144"
+ pagecolor="#ffffff"
+ bordercolor="#666666"
+ borderopacity="1.0"
+ inkscape:showpageshadow="2"
+ inkscape:pageopacity="0.0"
+ inkscape:pagecheckerboard="0"
+ inkscape:deskcolor="#d1d1d1"
+ showgrid="false"
+ inkscape:zoom="3.9333333"
+ inkscape:cx="77.288136"
+ inkscape:cy="29.618644"
+ inkscape:window-width="3840"
+ inkscape:window-height="2067"
+ inkscape:window-x="2560"
+ inkscape:window-y="0"
+ inkscape:window-maximized="1"
+ inkscape:current-layer="svg142" />
+ <defs
+ id="defs128">
+ <clipPath
+ id="clip-path">
+ <path
+ id="Rectangle_1"
+ d="M 0,0 H 155.314 V 60 H 0 Z"
+ fill="none"
+ data-name="Rectangle 1" />
+ </clipPath>
+ </defs>
+ <g
+ id="logo"
+ transform="translate(-1305.486,-780.83999)">
+ <g
+ id="Group_2"
+ clip-path="url(#clip-path)"
+ data-name="Group 2"
+ transform="translate(1305,780.355)">
+ <g
+ id="Group_1"
+ data-name="Group 1"
+ transform="translate(0.486,0.486)">
+ <path
+ id="Path_1"
+ d="m 1307.562,880.867 28.187,-28.893 a 0.521,0.521 0 0 0
0.063,-0.666 c -1.714,-2.393 -4.877,-2.808 -6.049,-4.416 -3.472,-4.763
-4.353,-7.459 -5.845,-7.292 a 0.456,0.456 0 0 0 -0.271,0.143 l -10.182,10.438 c
-5.858,6 -6.7,19.225 -6.852,30.3 a 0.552,0.552 0 0 0 0.949,0.386 z"
+ fill="#017cee"
+ data-name="Path 1"
+ transform="translate(-1306.613,-822.232)" />
+ <path
+ id="Path_2"
+ d="M 1405.512,908.489 1376.619,880.3 a 0.521,0.521 0 0 0
-0.667,-0.063 c -2.393,1.715 -2.808,4.877 -4.416,6.049 -4.763,3.472
-7.459,4.353 -7.292,5.845 a 0.456,0.456 0 0 0 0.143,0.27 l 10.438,10.182 c
6,5.858 19.225,6.7 30.3,6.852 a 0.552,0.552 0 0 0 0.387,-0.946 z"
+ fill="#00ad46"
+ data-name="Path 2"
+ transform="translate(-1346.876,-850.567)" />
+ <path
+ id="Path_3"
+ d="m 1373.909,902.252 c -3.28,-3.2 -4.8,-9.53 1.486,-22.583
-10.219,4.567 -13.8,10.57 -12.039,12.289 z"
+ fill="#04d659"
+ data-name="Path 3"
+ transform="translate(-1345.96,-850.233)" />
+ <path
+ id="Path_4"
+ d="m 1433.132,782.359 -28.186,28.893 a 0.52,0.52 0 0 0 -0.063,0.666
c 1.715,2.393 4.876,2.808 6.049,4.416 3.472,4.763 4.354,7.459 5.845,7.292 a
0.454,0.454 0 0 0 0.271,-0.143 l 10.182,-10.438 c 5.858,-6 6.7,-19.225
6.852,-30.3 a 0.553,0.553 0 0 0 -0.95,-0.386 z"
+ fill="#00c7d4"
+ data-name="Path 4"
+ transform="translate(-1375.21,-782.123)" />
+ <path
+ id="Path_5"
+ d="m 1426.9,881.155 c -3.2,3.28 -9.53,4.8 -22.584,-1.486
4.567,10.219 10.57,13.8 12.289,12.039 z"
+ fill="#11e1ee"
+ data-name="Path 5"
+ transform="translate(-1374.875,-850.233)" />
+ <path
+ id="Path_6"
+ d="m 1307,782.919 28.893,28.186 a 0.521,0.521 0 0 0 0.666,0.063 c
2.393,-1.715 2.808,-4.877 4.416,-6.049 4.763,-3.472 7.459,-4.353 7.292,-5.845 a
0.459,0.459 0 0 0 -0.143,-0.271 l -10.438,-10.182 c -6,-5.858 -19.225,-6.7
-30.3,-6.852 a 0.552,0.552 0 0 0 -0.386,0.95 z"
+ fill="#e43921"
+ data-name="Path 6"
+ transform="translate(-1306.766,-781.97)" />
+ <path
+ id="Path_7"
+ d="m 1405.8,804.711 c 3.28,3.2 4.8,9.53 -1.486,22.584 10.219,-4.567
13.8,-10.571 12.039,-12.289 z"
+ fill-rule="evenodd"
+ fill="#ff7557"
+ data-name="Path 7"
+ transform="translate(-1374.875,-797.859)" />
+ <path
+ id="Path_8"
+ d="m 1329.355,849.266 c 3.2,-3.28 9.53,-4.8 22.584,1.486
-4.567,-10.219 -10.57,-13.8 -12.289,-12.039 z"
+ fill="#0cb6ff"
+ data-name="Path 8"
+ transform="translate(-1322.503,-821.316)" />
+ <circle
+ id="Ellipse_1"
+ cx="1.26"
+ cy="1.26"
+ r="1.26"
+ fill="#4a4848"
+ data-name="Ellipse 1"
+ transform="translate(28.18,28.171)" />
+ <!-- <path id="Path_9" d="M1527.558 827.347a.229.229 0
0 1-.223-.223.458.458 0 0 1 .011-.123l2.766-7.214a.346.346 0 0 1
.357-.245h.758a.348.348 0 0 1 .357.245l2.754 7.214.022.123a.228.228 0 0
1-.223.223h-.568a.288.288 0 0 1-.19-.056.352.352 0 0
1-.089-.134l-.613-1.583h-3.657l-.613 1.583a.317.317 0 0 1-.1.134.269.269 0 0
1-.178.056zm4.795-2.732l-1.505-3.958-1.505 3.958zm3.322 4.85a.258.258 0 0
1-.189-.078.241.241 0 0 1-.067-.178v-7.4a.241.241 0 0 1 .067-.178.258.258 [...]
+ <!-- <path id="Path_10" d="M1527.2
827.081l-.061.061zm-.056-.279l-.08-.031zm2.766-7.214l.08.031zm1.472
0l-.081.029zm2.754 7.214l.084-.015a.064.064 0 0 0
0-.015zm.022.123h.086v-.015zm-.067.156l.06.061zm-.914.011l-.061.061.006.005zm-.089-.134l.081-.027zm-.613-1.583l.08-.031a.086.086
0 0 0-.08-.055zm-3.657 0v-.086a.086.086 0 0 0-.08.055zm-.613
1.583l-.08-.031zm-.1.134l.055.066zm4.047-2.676v.086a.086.086 0 0 0
.08-.116zm-1.505-3.958l.08-.03a.086.086 0 0 0-.16 0zm-1.505 [...]
+ <!-- <path id="Path_11" d="M1519.066 884.011a.581.581 0
0 1-.567-.567 1.151 1.151 0 0 1 .028-.312l7.026-18.328a.881.881 0 0 1
.906-.623h1.926a.882.882 0 0 1 .907.623l7 18.328.057.312a.583.583 0 0
1-.567.567h-1.445a.735.735 0 0 1-.482-.142.9.9 0 0
1-.226-.34l-1.558-4.023h-9.292l-1.558 4.023a.8.8 0 0 1-.255.34.688.688 0 0
1-.453.142zm12.181-6.94l-3.824-10.056-3.823 10.055zm8.184-10.538a.592.592 0 0
1-.652-.651v-1.53a.714.714 0 0 1 .17-.482.656.656 0 0 1 .482-.2h1.785 [...]
+ </g>
+ </g>
+ </g>
+</svg>
diff --git
a/docs/hop-user-manual/modules/ROOT/assets/images/how-to-guides/run-hop-in-apache-airflow/apache-airflow-dag-available.png
b/docs/hop-user-manual/modules/ROOT/assets/images/how-to-guides/run-hop-in-apache-airflow/apache-airflow-dag-available.png
new file mode 100644
index 0000000000..bffafb5c06
Binary files /dev/null and
b/docs/hop-user-manual/modules/ROOT/assets/images/how-to-guides/run-hop-in-apache-airflow/apache-airflow-dag-available.png
differ
diff --git
a/docs/hop-user-manual/modules/ROOT/assets/images/how-to-guides/run-hop-in-apache-airflow/apache-airflow-dag-error.png
b/docs/hop-user-manual/modules/ROOT/assets/images/how-to-guides/run-hop-in-apache-airflow/apache-airflow-dag-error.png
new file mode 100644
index 0000000000..dd3e0a2431
Binary files /dev/null and
b/docs/hop-user-manual/modules/ROOT/assets/images/how-to-guides/run-hop-in-apache-airflow/apache-airflow-dag-error.png
differ
diff --git
a/docs/hop-user-manual/modules/ROOT/assets/images/how-to-guides/run-hop-in-apache-airflow/apache-airflow-dag-graph.png
b/docs/hop-user-manual/modules/ROOT/assets/images/how-to-guides/run-hop-in-apache-airflow/apache-airflow-dag-graph.png
new file mode 100644
index 0000000000..fb9212c07f
Binary files /dev/null and
b/docs/hop-user-manual/modules/ROOT/assets/images/how-to-guides/run-hop-in-apache-airflow/apache-airflow-dag-graph.png
differ
diff --git
a/docs/hop-user-manual/modules/ROOT/assets/images/how-to-guides/run-hop-in-apache-airflow/apache-airflow-dag-logs.png
b/docs/hop-user-manual/modules/ROOT/assets/images/how-to-guides/run-hop-in-apache-airflow/apache-airflow-dag-logs.png
new file mode 100644
index 0000000000..d1faf9ddaf
Binary files /dev/null and
b/docs/hop-user-manual/modules/ROOT/assets/images/how-to-guides/run-hop-in-apache-airflow/apache-airflow-dag-logs.png
differ
diff --git
a/docs/hop-user-manual/modules/ROOT/assets/images/how-to-guides/run-hop-in-apache-airflow/apache-airflow-dag-run.png
b/docs/hop-user-manual/modules/ROOT/assets/images/how-to-guides/run-hop-in-apache-airflow/apache-airflow-dag-run.png
new file mode 100644
index 0000000000..17c8bbe036
Binary files /dev/null and
b/docs/hop-user-manual/modules/ROOT/assets/images/how-to-guides/run-hop-in-apache-airflow/apache-airflow-dag-run.png
differ
diff --git
a/docs/hop-user-manual/modules/ROOT/assets/images/how-to-guides/run-hop-in-apache-airflow/apache-airflow-dag-runs.png
b/docs/hop-user-manual/modules/ROOT/assets/images/how-to-guides/run-hop-in-apache-airflow/apache-airflow-dag-runs.png
new file mode 100644
index 0000000000..6eb01e72ef
Binary files /dev/null and
b/docs/hop-user-manual/modules/ROOT/assets/images/how-to-guides/run-hop-in-apache-airflow/apache-airflow-dag-runs.png
differ
diff --git
a/docs/hop-user-manual/modules/ROOT/assets/images/how-to-guides/run-hop-in-apache-airflow/apache-airflow-empty-server.png
b/docs/hop-user-manual/modules/ROOT/assets/images/how-to-guides/run-hop-in-apache-airflow/apache-airflow-empty-server.png
new file mode 100644
index 0000000000..1e74d89b6d
Binary files /dev/null and
b/docs/hop-user-manual/modules/ROOT/assets/images/how-to-guides/run-hop-in-apache-airflow/apache-airflow-empty-server.png
differ
diff --git
a/docs/hop-user-manual/modules/ROOT/assets/images/how-to-guides/run-hop-in-apache-airflow/apache-airflow-pipeline-with-parameter.png
b/docs/hop-user-manual/modules/ROOT/assets/images/how-to-guides/run-hop-in-apache-airflow/apache-airflow-pipeline-with-parameter.png
new file mode 100644
index 0000000000..0c685bd130
Binary files /dev/null and
b/docs/hop-user-manual/modules/ROOT/assets/images/how-to-guides/run-hop-in-apache-airflow/apache-airflow-pipeline-with-parameter.png
differ
diff --git
a/docs/hop-user-manual/modules/ROOT/assets/images/how-to-guides/run-hop-in-apache-airflow/apache-airflow-run-config
(1).png
b/docs/hop-user-manual/modules/ROOT/assets/images/how-to-guides/run-hop-in-apache-airflow/apache-airflow-run-config
(1).png
new file mode 100644
index 0000000000..29e4566b32
Binary files /dev/null and
b/docs/hop-user-manual/modules/ROOT/assets/images/how-to-guides/run-hop-in-apache-airflow/apache-airflow-run-config
(1).png differ
diff --git
a/docs/hop-user-manual/modules/ROOT/assets/images/how-to-guides/run-hop-in-apache-airflow/apache-airflow-run-config.png
b/docs/hop-user-manual/modules/ROOT/assets/images/how-to-guides/run-hop-in-apache-airflow/apache-airflow-run-config.png
new file mode 100644
index 0000000000..139e8cb114
Binary files /dev/null and
b/docs/hop-user-manual/modules/ROOT/assets/images/how-to-guides/run-hop-in-apache-airflow/apache-airflow-run-config.png
differ
diff --git
a/docs/hop-user-manual/modules/ROOT/assets/images/how-to-guides/run-hop-in-apache-airflow/apache-airflow-trigger-dag-with-config.png
b/docs/hop-user-manual/modules/ROOT/assets/images/how-to-guides/run-hop-in-apache-airflow/apache-airflow-trigger-dag-with-config.png
new file mode 100644
index 0000000000..80d16d0d2b
Binary files /dev/null and
b/docs/hop-user-manual/modules/ROOT/assets/images/how-to-guides/run-hop-in-apache-airflow/apache-airflow-trigger-dag-with-config.png
differ
diff --git
a/docs/hop-user-manual/modules/ROOT/assets/images/how-to-guides/run-hop-in-apache-airflow/apache-airflow-trigger.png
b/docs/hop-user-manual/modules/ROOT/assets/images/how-to-guides/run-hop-in-apache-airflow/apache-airflow-trigger.png
new file mode 100644
index 0000000000..41c2d36bfb
Binary files /dev/null and
b/docs/hop-user-manual/modules/ROOT/assets/images/how-to-guides/run-hop-in-apache-airflow/apache-airflow-trigger.png
differ
diff --git
a/docs/hop-user-manual/modules/ROOT/assets/images/how-to-guides/run-hop-in-apache-airflow/apache-airflow-two-dags.png
b/docs/hop-user-manual/modules/ROOT/assets/images/how-to-guides/run-hop-in-apache-airflow/apache-airflow-two-dags.png
new file mode 100644
index 0000000000..8358ee7568
Binary files /dev/null and
b/docs/hop-user-manual/modules/ROOT/assets/images/how-to-guides/run-hop-in-apache-airflow/apache-airflow-two-dags.png
differ
diff --git
a/docs/hop-user-manual/modules/ROOT/pages/how-to-guides/run-hop-in-apache-airflow.adoc
b/docs/hop-user-manual/modules/ROOT/pages/how-to-guides/run-hop-in-apache-airflow.adoc
index 6827c869b9..11eb1ee47f 100644
---
a/docs/hop-user-manual/modules/ROOT/pages/how-to-guides/run-hop-in-apache-airflow.adoc
+++
b/docs/hop-user-manual/modules/ROOT/pages/how-to-guides/run-hop-in-apache-airflow.adoc
@@ -16,30 +16,135 @@ under the License.
////
[[HopServer]]
:imagesdir: ../../assets/images
-:description: This tutorial explains how to run Apache Hop workflows and
pipelines in Apache Airflow with the DockerOperator
+:description: This how-to explains how to run Apache Hop workflows and
pipelines in Apache Airflow with the DockerOperator
-= Run workflows and pipelines from Apache Airflow
+= image:how-to-guides/run-hop-in-apache-airflow/airflow-logo.svg[Apache
Airflow, width="75vw", align="center"]Run workflows and pipelines in Apache
Airflow
-== Introduction
+== What is Apache Airflow?
-Apache Airflow is an open-source workflow management platform for data
engineering pipelines.
+From the https://airflow.apache.org/[Apache Airflow website]:
-Airflow uses directed acyclic graphs (DAGs) to manage workflow orchestration.
Tasks and dependencies are defined in Python and then Airflow manages the
scheduling and execution. DAGs can be run either on a defined schedule (e.g.
hourly or daily) or based on external event triggers (e.g. a file appearing in
an AWS S3 bucket). DAGs can often be written in one Python file.
+[quote]
+Airflow is a platform created by the community to programmatically author,
schedule and monitor workflows.
-Apache Hop workflows and pipelines can be used in Airflow through the
https://airflow.apache.org/docs/apache-airflow-providers-docker/stable/_api/airflow/providers/docker/operators/docker/index.html[DockerOperator^]
.
-Alternatively, the
https://airflow.apache.org/docs/apache-airflow/stable/howto/operator/bash.html[BashOperator^]
to call xref:hop-run/index.adoc[Hop Run] could also be used.
+Airflow uses Directed Acyclic Graphs (or
https://airflow.apache.org/docs/apache-airflow/1.10.10/concepts.html[DAGs^]). A
DAG is a collection of all the tasks you want to run, organized in a way that
reflects their relationships and dependencies.
-== Sample Dag
+A DAG is defined in a Python script, which represents the DAGs structure
(tasks and their dependencies) as code.
-Running a Hop workflow or pipeline through the Airflow DockerOperator uses
Docker to run a workflow or pipeline through a Docker container.
+From an Apache Hop point of view, our focus is different: Apache Hop wants to
enable citizen developers to be productive data engineers without the need to
write code. With that in mind, we don't need all the bells and whistles Apache
Airflow provides (but don't let that stop you from using Apache Airflow to its
full potential!).
-TIP: Check the xref:tech-manual::docker-container.adoc[Docker] docs for more
information on how to run Apache Hop workflows and pipelines with Docker. Check
xref:projects/index.adoc[Projects and environments] for more information and
best practices to set up your project .
+== Run Apache Airflow in Docker Compose
-In the example below, we'll run a sample pipeline. The project and environment
will be provided as mounted volumes to the container
(`LOCAL_PATH_TO_PROJECT_FOLDER` and `LOCAL_PATH_TO_ENV_FOLDER`).
+The goal of this page is to get a basic Airflow setup running to demonstrate
how Apache Airflow and Apache Hop can be used together. Check out the different
https://airflow.apache.org/docs/apache-airflow/stable/installation/index.html[installation
options^] if you want to build a production-ready Apache Airflow installation.
-Since your Airflow workflows probably will do more than just run a pipeline
(e.g. perform a `git clone` or `git pull` first), two DummyOperators (start and
end) were added to the sample.
+To keep things simple, we'll use Docker Compose to get Apache Airflow up and
running in a matter of minutes. Even though
https://docs.docker.com/compose/[Docker Compose^] has been said to be on the
verge of extinction for quite a while now, it still is a quick and convenient
way to experiment with data platforms that would otherwise be time-consuming
and difficult to set up.
-[code,python]
+Apache Airflow provides a
https://airflow.apache.org/docs/apache-airflow/2.6.0/docker-compose.yaml[docker-compose.yaml^]
file. Our goal is to run Apache Hop workflows and pipelines in Apache Airflow,
so we're not interested in the Airflow sample DAGs that come with this
docker-compose file.
+
+Change the **AIRFLOW__CORE__LOAD_EXAMPLES** variable to "false" in the default
file, and add an additional line **/var/run/docker.sock:/var/run/docker.sock**
in the volumes section.
+All of this has already been done if you use the
https://github.com/apache/hop/tree/master/docs/hop-user-manual/modules/ROOT/assets/files/how-to-guides/apache-airflow/docker-compose.yaml[the
file] in our github repository.
+
+To run Apache Airflow from this docker-compose file, go the directory where
you saved this file and run
+
+[source, bash]
+----
+docker compose up
+----
+
+The various Apache Airflow need a couple of moments to start. Once you see a
couple of lines like the ones below in the logs, we're good to go.
+
+[source, bash]
+----
+apache-airflow-airflow-triggerer-1 | [2023-05-07 07:50:08 +0000] [24] [INFO]
Booting worker with pid: 24
+apache-airflow-airflow-triggerer-1 | [2023-05-07 07:50:08 +0000] [25] [INFO]
Booting worker with pid: 25
+apache-airflow-airflow-scheduler-1 | ____________ _____________
+apache-airflow-airflow-scheduler-1 | ____ |__( )_________ __/__
/________ __
+apache-airflow-airflow-scheduler-1 | ____ /| |_ /__ ___/_ /_ __ /_ __
\_ | /| / /
+apache-airflow-airflow-scheduler-1 | ___ ___ | / _ / _ __/ _ / / /_/
/_ |/ |/ /
+apache-airflow-airflow-scheduler-1 | _/_/ |_/_/ /_/ /_/ /_/
\____/____/|__/
+apache-airflow-airflow-scheduler-1 | [2023-05-07T07:50:08.601+0000]
{executor_loader.py:114} INFO - Loaded executor: CeleryExecutor
+apache-airflow-airflow-scheduler-1 | [2023-05-07T07:50:08.652+0000]
{scheduler_job_runner.py:823} INFO - Starting the scheduler
+apache-airflow-airflow-scheduler-1 | [2023-05-07T07:50:08.653+0000]
{scheduler_job_runner.py:830} INFO - Processing each file at most -1 times
+apache-airflow-airflow-scheduler-1 | [2023-05-07T07:50:08.657+0000]
{manager.py:165} INFO - Launched DagFileProcessorManager with pid: 34
+apache-airflow-airflow-scheduler-1 | [2023-05-07T07:50:08.658+0000]
{scheduler_job_runner.py:1576} INFO - Resetting orphaned tasks for active dag
runs
+apache-airflow-airflow-scheduler-1 | [2023-05-07T07:50:08.660+0000]
{settings.py:60} INFO - Configured default timezone Timezone('UTC')
+----
+
+Go to http://localhost:8080/home in your browser and log on with username
"airflow" and password "airflow".
+
+Even though we're not running in production, the username and password can be
easily changed from the docker-compose file. Just change the values for the
**AIRFLOW_WWW_USER_USERNAME** and **AIRFLOW_WWW_USER_PASSWORD** variables in
the docker-compose file or use any of the available ways to
https://docs.docker.com/compose/environment-variables/set-environment-variables/[work
with variables^] in Docker compose.
+
+After you logged on, Apache Airflow will show you an empty list of DAGs. We're
ready for the real fun.
+
+image:how-to-guides/run-hop-in-apache-airflow/apache-airflow-empty-server.png[Apache
Airflow - Empty server, width="75%"]
+
+== Your first Apache Airflow and Apache Hop DAG
+
+We'll use the Apache Airflow
https://airflow.apache.org/docs/apache-airflow-providers-docker/stable/_api/airflow/providers/docker/operators/docker/index.html[DockerOperator^]
to run Apache Hop workflows and pipelines from an embedded container in Apache
Airflow.
+
+Again, you don't need to be an Apache Airflow, Docker, or Python expert to
create DAGs, we'll treat DAGs as just another text file.
+Since we'll use a container to run our workflows and pipelines, the
configuration in our DAG will look very similar to the environment variables
you'll pass to the xref:tech-manual::docker-container.adoc[short-lived Apache
Hop container].
+
+Let's take a closer look at a couple of things in the DAG we'll use. This will
look very familiar if you've even run Apache Hop workflows and pipelines in
containers:
+
+Import the DockerOperator into your DAG:
+
+[source, python]
+----
+from airflow.operators.docker_operator import DockerOperator
+----
+
+Let's take a look at the end of the Apache Hop task first:
+
+[source, python]
+----
+mounts=[Mount(source='LOCAL_PATH_TO_PROJECT_FOLDER', target='/project',
type='bind'),
+ Mount(source='LOCAL_PATH_TO_ENV_FOLDER', target='/project-config',
type='bind')],
+----
+
+The mounts section is where we'll link your project and environment folders to
the container.
+**LOCAL_PATH_TO_PROJECT_FOLDER** is the path to the project folder on your
local file system (the folder where you keep your hop-config.json file,
metadata folder and workflows and pipelines). This folder will be mounted as
/project inside the container.
+**LOCAL_PATH_TO_ENV_FOLDER** is similar but points to the folder where your
environment configuration (json) files are. This folder will be mounted as
/project-config inside the container.
+
+Define and configure the pipeline in your DAG task:
+
+[source, python]
+----
+hop = DockerOperator(
+ task_id='sample-pipeline',
+ # use the Apache Hop Docker image. Add your tags here in the default
apache/hop: syntax
+ image='apache/hop',
+ api_version='auto',
+ auto_remove=True,
+ environment= {
+ 'HOP_RUN_PARAMETERS': 'INPUT_DIR=',
+ 'HOP_LOG_LEVEL': 'Basic',
+ 'HOP_FILE_PATH': '${PROJECT_HOME}/transforms/null-if-basic.hpl',
+ 'HOP_PROJECT_DIRECTORY': '/project',
+ 'HOP_PROJECT_NAME': 'hop-airflow-sample',
+ 'HOP_ENVIRONMENT_NAME': 'env-hop-airflow-sample.json',
+ 'HOP_ENVIRONMENT_CONFIG_FILE_NAME_PATHS':
'/project-config/env-hop-airflow-sample.json',
+ 'HOP_RUN_CONFIG': 'local'
+ },
+----
+
+The parameters to specify here are:
+
+* **task_id**: a unique id for this Airflow task in the DAG
+* **image**: we use "apache/hop" in this example, which will always grab the
latest release. Add a tag to use a specific Apache Hop release, e.g.
"apache/hop:2.4.0" or "apache/hop:Development" for the very latest development
version
+* **environment** is where we'll tell the DockerOperator which pipeline to run
and provide additional configuration. The environment variables used here are
exactly what you would pass to a standalone short-lived container without
Airflow:
+** HOP_RUN_PARAMETERS: parameters to pass to the workflow or pipeline
+** HOP_LOG_LEVEL: the logging level to use with your workflow or pipeline
+** HOP_FILE_PATH: the path to the workflow or pipeline you want to use. This
is the path in the container and is relative to the project folder
+** HOP_PROJECT_DIRECTORY: the folder where your project files live. In this
example, this is the /project folder we mounted in the previous section.
+** HOP_PROJECT_NAME: your Apache Hop project's name. This will only be used
internally (and will show in the logs). Your project name is not necessarily
the same name you used to develop the project in Hop Gui, but keeping things
consistent never hurts.
+** HOP_ENVIRONMENT_NAME: similar to the project name, this is the name for the
environment that will be created through hop-conf when the container starts.
+** HOP_ENVIRONMENT_CONFIG_FILE_NAME_PATHS: the paths to your environment
configuration files. These file paths should be relative to the /project-config
folder we mounted in the previous section.
+** HOP_RUN_CONFIG: the workflow or pipeline run configuration to use. Your
mileage may vary, but in the vast majority of cases, using a local run
configuration will be what you need.
+
+That's everything we need to specify for a first run. This DAG will look like
the one below:
+
+[source, python]
----
from datetime import datetime, timedelta
from airflow import DAG
@@ -66,16 +171,16 @@ with DAG('sample-pipeline', default_args=default_args,
schedule_interval=None, c
end_dag = DummyOperator(
task_id='end_dag'
)
- hop = DockerOperator(
+ hop = DockerOperator(
task_id='sample-pipeline',
-# use the Apache Hop Docker image. Add your tags here in the default
apache/hop:<TAG> syntax
+ # use the Apache Hop Docker image. Add your tags here in the default
apache/hop: syntax
image='apache/hop',
api_version='auto',
auto_remove=True,
environment= {
- 'HOP_RUN_PARAMETERS': 'INPUT_DIR=<YOUR_INPUT_PATH>',
+ 'HOP_RUN_PARAMETERS': 'INPUT_DIR=',
'HOP_LOG_LEVEL': 'Basic',
- 'HOP_FILE_PATH': '${PROJECT_HOME}/etl/sample-pipeline.hpl',
+ 'HOP_FILE_PATH': '${PROJECT_HOME}/transforms/null-if-basic.hpl',
'HOP_PROJECT_DIRECTORY': '/project',
'HOP_PROJECT_NAME': 'hop-airflow-sample',
'HOP_ENVIRONMENT_NAME': 'env-hop-airflow-sample.json',
@@ -84,16 +189,162 @@ with DAG('sample-pipeline', default_args=default_args,
schedule_interval=None, c
},
docker_url="unix://var/run/docker.sock",
network_mode="bridge",
- mounts=[Mount(source='<LOCAL_PATH_TO_PROJECT_FOLDER>',
target='/project', type='bind'), Mount(source='LOCAL_PATH_TO_ENV_FOLDER',
target='/project-config', type='bind')],
+ mounts=[Mount(source='LOCAL_PATH_TO_PROJECT_FOLDER',
target='/project', type='bind'), Mount(source='LOCAL_PATH_TO_ENV_FOLDER',
target='/project-config', type='bind')],
force_pull=False
)
start_dag >> hop >> end_dag
----
-After you deploy this DAG to your Airflow dags folder (e.g. as
`hop-airflow-sample.py`), it will be picked up by Apache Airflow and is ready
to run.
+== Deploy and run your first DAG
+
+All it takes to deploy your dag is to put it in Airflow's dags folder. Our
docker-compose setup has created a dags folder in the directory where you
started the compose file. Airflow will scan this folder every two minutes by
default.
+
+Save the DAG we just created in your dags folder as apache-hop-dag-simple.py.
After a short wait, your DAG will show up in the list of dags.
+
+If there are any syntax errors in your DAG, Airflow will let you know. Expand
the error dialog for more details about the error.
+
+image:how-to-guides/run-hop-in-apache-airflow/apache-airflow-dag-error.png[Apache
Airflow - DAG error, width="45%"]
+
+image:how-to-guides/run-hop-in-apache-airflow/apache-airflow-dag-available.png[Apache
Airflow - DAG available, width="75%"]
+
+Click on the **sample-pipeline** DAG to see more details about it. From the
tab list at the top of the page, select "Code" to review the DAG you just
deployed, or "Graph" to see the graph representation of the DAG. This graph is
extremely simple, but we're exploring Apache Airflow, so that's intentional.
+
+image:how-to-guides/run-hop-in-apache-airflow/apache-airflow-dag-graph.png[Apache
Airflow - DAG graph, width="65%"]
+
+To run this DAG, click the play icon with the **Trigger DAG** option. The icon
is available from multiple locations in the Apache Airflow user interface. It
is almost always available in the upper right corner.
+
+image:how-to-guides/run-hop-in-apache-airflow/apache-airflow-dag-run.png[Apache
Airflow - trigger DAG, width="45%"]
+
+Your DAG will run in the background. To follow up and check the logs, click on
your DAG name to go to its details page.
+
+image:how-to-guides/run-hop-in-apache-airflow/apache-airflow-dag-logs.png[Apache
Airflow - DAG logs, width="45%"]
+
+[source, bash]
+----
+2023-05-07, 13:54:39 UTC] {docker.py:391} INFO - 2023/05/07 13:54:39 - Ouput.0
- Finished processing (I=0, O=0, R=5, W=5, U=0, E=0)
+[2023-05-07, 13:54:39 UTC] {docker.py:391} INFO - 2023/05/07 13:54:39 -
null-if-basic - Pipeline duration : 0.45 seconds [ 0.450 ]
+[2023-05-07, 13:54:39 UTC] {docker.py:391} INFO - HopRun exit.
+[2023-05-07, 13:54:39 UTC] {docker.py:391} INFO - 2023/05/07 13:54:39 -
null-if-basic - Execution finished on a local pipeline engine with run
configuration 'local'
+[2023-05-07, 13:54:40 UTC] {taskinstance.py:1373} INFO - Marking task as
SUCCESS. dag_id=sample-pipeline, task_id=sample-pipeline,
execution_date=20230507T135409, start_date=20230507T135411,
end_date=20230507T135440
+[2023-05-07, 13:54:40 UTC] {local_task_job_runner.py:232} INFO - Task exited
with return code 0
+----
+
+When you return to the Airflow home screen, your DAG will now show green
circles for successful runs.
+
+image:how-to-guides/run-hop-in-apache-airflow/apache-airflow-dag-runs.png[Apache
Airflow - DAG logs, width="90%"]
+
+== Using variables and parameters in a DAG
+
+Your real-life pipelines will be more complex than the extremely simple
example pipeline we just ran.
+
+In the basic example we just ran, we passed an environment file but didn't use
it. In a lot of cases, you'll want to not only use variables from your
environment files, you may also want to pass parameters to your pipelines and
workflows. Let's have a closer look at that.
+
+Create the environment configuration below to a config folder next to your
project folder. We'll use the pipeline `pipelines/pipeline-with-parameter.hpl`
from the samples project to print a pipeline parameter and a variable from the
environment configuration file to the logs. Again, these examples are extremely
simple, your real-life projects will be more complex, but the process remains
the same.
+
+[source, json]
+----
+{
+ "variables" : [ {
+ "name" : "ENV_VARIABLE",
+ "value" : "variable value",
+ "description" : ""
+ } ]
+}
+----
+
+This pipeline is again very basic. All we'll do is accept a parameter and
print it in the logs:
+
+image:how-to-guides/run-hop-in-apache-airflow/apache-airflow-pipeline-with-parameter.png[Apache
Airflow - run a pipeline with parameters, width="75%"]
+
+We'll create a new DAG for this example. Most of it will be the same or
similar to the previous example, with some minor changes:
+
+First of all, we'll need to add one additional import at the start of the DAG:
+
+[source, python]
+----
+from airflow import DAG
+from airflow.models import Variable
+from airflow.operators.bash_operator import BashOperator
+----
+
+Next, we'll need to add the parameter in this pipeline and tell Airflow to
pick up the values from the run configuration we'll pass to the DAG later on.
+
+We'll also use logging level Detailed to make sure we can see the parameters
we'll pass to the pipeline.
+
+[source, python]
+----
+environment= {
+ 'HOP_RUN_PARAMETERS': 'PRM_EXAMPLE=',
+ 'HOP_LOG_LEVEL': 'Detailed',
+ 'HOP_FILE_PATH': '${PROJECT_HOME}/hop/pipeline-with-parameter.hpl',
+ 'HOP_PROJECT_DIRECTORY': '/project',
+ 'HOP_PROJECT_NAME': 'hop-airflow-sample',
+ 'HOP_ENVIRONMENT_NAME': 'env-hop-airflow-sample.json',
+ 'HOP_ENVIRONMENT_CONFIG_FILE_NAME_PATHS':
'/project-config/hop-airflow-config.json',
+ 'HOP_RUN_CONFIG': 'local'
+ },
+----
+
+Also, we really need the environment configuration file this time, so make
sure your mounts are correct.
+
+[source, python]
+----
+mounts=[Mount(source='<YOUR_PROJECT_PATH>/', target='/project', type='bind'),
+ Mount(source='<YOUR_CONFIG_PATH>/config/',
target='/project-config', type='bind')],
+----
+
+Add this new DAG to your dags folder and wait for it to appear in your Apache
Airflow console.
+
+To run this DAG with parameters, we'll use the **Trigger DAG w/ config**
option. We'll specify the **prm_example** value that Airflow will pass to the
**PRM_EXAMPLE** parameter in the pipeline. The syntax to use is shown below.
Click "Trigger" when you're done.
+
+image:how-to-guides/run-hop-in-apache-airflow/apache-airflow-two-dags.png[Apache
Airflow - two DAGs, width="90%"]
+
+[%autowidth, cols="3,3,3", frame=none, grid=none]
+|===
+|
image:how-to-guides/run-hop-in-apache-airflow/apache-airflow-trigger-dag-with-config.png[Apache
Airflow - trigger DAG with config]
+|
image:how-to-guides/run-hop-in-apache-airflow/apache-airflow-run-config.png[Apache
Airflow - trigger DAG with config]
+|
image:how-to-guides/run-hop-in-apache-airflow/apache-airflow-trigger.png[Apache
Airflow - trigger DAG with config]
+|===
+
+Your DAG logs will now show the environment variable and the parameter we used
in this example:
+
+[source, bash]
+----
+[2023-05-08, 08:21:34 UTC] {docker.py:391} INFO - 2023/05/08 08:21:34 -
pipeline-with-parameter - Pipeline has allocated 5 threads and 4 rowsets.
+[2023-05-08, 08:21:34 UTC] {docker.py:391} INFO - 2023/05/08 08:21:34 -
generate 1 row.0 - Starting to run...
+[2023-05-08, 08:21:34 UTC] {docker.py:391} INFO - 2023/05/08 08:21:34 -
generate 1 row.0 - Finished processing (I=0, O=0, R=0, W=1, U=0, E=0)
+[2023-05-08, 08:21:34 UTC] {docker.py:391} INFO - 2023/05/08 08:21:34 - get
${PRM_EXAMPLE}.0 - field [example] has value [EXAMPLE VALUE]
+[2023-05-08, 08:21:34 UTC] {docker.py:391} INFO - 2023/05/08 08:21:34 - get
${PRM_EXAMPLE}.0 - Finished processing (I=0, O=0, R=1, W=1, U=0, E=0)
+[2023-05-08, 08:21:34 UTC] {docker.py:391} INFO - 2023/05/08 08:21:34 - write
parameter to log.0 -
+[2023-05-08, 08:21:34 UTC] {docker.py:391} INFO - 2023/05/08 08:21:34 - get
${ENV_VARIABLE}.0 - field [env_variable] has value [variable value]
+[2023-05-08, 08:21:34 UTC] {docker.py:391} INFO - 2023/05/08 08:21:34 - write
env_variable to log.0 -
+[2023-05-08, 08:21:34 UTC] {docker.py:391} INFO - 2023/05/08 08:21:34 - write
parameter to log.0 - Finished processing (I=0, O=0, R=1, W=1, U=0, E=0)
+[2023-05-08, 08:21:34 UTC] {docker.py:391} INFO - 2023/05/08 08:21:34 - get
${ENV_VARIABLE}.0 - Finished processing (I=0, O=0, R=1, W=1, U=0, E=0)
+[2023-05-08, 08:21:34 UTC] {docker.py:391} INFO - 2023/05/08 08:21:34 - write
env_variable to log.0 - Finished processing (I=0, O=0, R=1, W=1, U=0, E=0)
+----
+
+== Scheduling a DAG in Apache Airflow
+
+So far, we've looked at DAG that we ran manually and ad-hoc. There are lots of
https://airflow.apache.org/docs/apache-airflow/stable/authoring-and-scheduling/index.html[well-documented^]
options to schedule DAGs in Apache Airflow. Since scheduling your DAGs is not
really Apache Hop related, we'll only cover this briefly here.
+
+One option is to provide a cron string to schedule your DAG execution. For
example, to run a specific DAG at 10:00 am every morning, we'll change the
schedule_interval from None to a cron expression in the "with DAG" line in our
DAG (line breaks added for readability):
+
+[source, python]
+----
+ with DAG(
+ 'sample-pipeline',
+ default_args=default_args,
+ schedule_interval='0 10 * * *',
+ catchup=False,
+ is_paused_upon_creation=False
+ ) as dag:
+----
+
+For a more detailed description of the scheduling options in Apache Airflow,
you may find
https://medium.com/@thehippieandtheboss/how-to-define-the-dag-schedule-interval-parameter-cb2d81d2a02e[this
Medium post^] helpful.
-Check the Airflow logs for the `sample-pipeline` task for the full Hop logs of
the pipeline execution.
+== Summary
-image:how-to-guides/run-hop-in-apache-airflow/airflow-dag-run.png[Apache
Airflow - Hop DAG run, width="45%"]
+We've covered the basics of running Apache Hop pipelines (or workflows) in
Apache Airflow with the DockerOperator.
+There are other options: you could use Airflow's
https://airflow.apache.org/docs/apache-airflow/stable/howto/operator/bash.html[BashOperator^]
to use xref:hop-run/index.adoc[hop-run] directly or the
https://airflow.apache.org/docs/apache-airflow-providers-http/stable/operators.html[HTTP
operator^] to run pipelines or workflows on a remote hop server.