[GitHub] [airflow] jedcunningham commented on a diff in pull request #34381: AWS ECS Executor

via GitHub Thu, 14 Sep 2023 16:53:29 -0700


jedcunningham commented on code in PR #34381:
URL: https://github.com/apache/airflow/pull/34381#discussion_r1326603097



##########
airflow/providers/amazon/aws/config_templates/config.yml:
##########
@@ -0,0 +1,131 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+---
+
+aws_ecs_executor:
+  description: |
+    This section only applies if you are using the AwsEcsExecutor in
+    Airflow's ``[core]`` configuration.
+    For more information on any of these execution parameters, see the link 
below:
+    
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/ecs/client/run_task.html
+    For boto3 credential management, see
+    
https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html
+  options:
+    conn_id:
+      description: |
+        The Airflow connection (i.e. credentials) used by the ECS executor to 
make API calls to AWS ECS.
+      version_added: "2.8"
+      type: string
+      example: "aws_default"
+      default: "aws_default"
+    region:
+      description: |
+        The name of the AWS Region where Amazon ECS is configured. Required.
+      version_added: "2.8"
+      type: string
+      example: "us-east-1"
+      default: ~
+    assign_public_ip:
+      description: |
+        Whether to assign a public IP address to the containers launched by 
the ECS executor.
+        For more info see url to Boto3 docs above.
+      version_added: "2.8"
+      type: boolean
+      example: "True"
+      default: "False"
+    cluster:
+      description: |
+        Name of the Amazon ECS Cluster. Required.
+      version_added: "2.8"
+      type: string
+      example: "ecs_executor_cluster"
+      default: ~
+    container_name:
+      description: |
+        Name of the container that will be used to execute Airflow tasks via 
the ECS executor.
+        The container should be specified in the ECS Task Definition and will 
receive an airflow
+        CLI command as an additional parameter to its entrypoint. For more 
info see url to Boto3
+        docs above. Required.
+      version_added: "2.8"
+      type: string
+      example: "ecs_executor_container"
+      default: ~
+    launch_type:
+      description: |
+        Launch type can either be 'FARGATE' OR 'EC2'. For more info see url to
+        Boto3 docs above.
+
+        If the launch type is EC2, the executor will attempt to place tasks on
+        empty EC2 instances. If there are no EC2 instances available, no task
+        is placed and this function will be called again in the next 
heart-beat.
+
+        If the launch type is FARGATE, this will run the tasks on new AWS 
Fargate
+        instances.
+      version_added: "2.8"
+      type: string
+      example: "FARGATE"
+      default: "FARGATE"
+    platform_version:
+      description: |
+        The platform version the task uses. A platform version is only 
specified
+        for tasks hosted on Fargate. If one isn't specified, the LATEST 
platform
+        version is used.
+      version_added: "2.8"
+      type: string
+      example: "1.4.0"
+      default: "LATEST"
+    security_groups:
+      description: |
+        The comma-seperated IDs of the security groups associated with the 
task. If you
+        don't specify a security group, the default security group for the VPC 
is used.
+        There's a limit of 5 security groups. For more info see url to Boto3 
docs above.
+      version_added: "2.8"
+      type: string
+      example: "sg-XXXX,sg-YYYY"
+      default: ~
+    subnets:
+      description: |
+        The comma-separated IDs of the subnets associated with the task or 
service.
+        There's a limit of 16 subnets. For more info see url to Boto3 docs 
above.
+      version_added: "2.8"
+      type: string
+      example: "subnet-XXXXXXXX,subnet-YYYYYYYY"
+      default: ~
+    task_definition:
+      description: |
+        The family and revision (family:revision) or full ARN of the task 
definition
+        to run. If a revision isn't specified, the latest ACTIVE revision is 
used.
+        For more info see url to Boto3 docs above.
+      version_added: "2.8"
+      type: string
+      example: executor_task_definition:LATEST
+      default: ~
+    max_run_task_attempts:
+      description: |
+        The maximum number of times the Ecs Executor should attempt to run a 
task.
+      version_added: "2.8"
+      type: int
+      example: "3"
+      default: "3"

Review Comment:
   This feels like a dangerous default (and maybe even feature). Don't we want 
Airflows retries to control this instead?



##########
airflow/providers/amazon/aws/config_templates/config.yml:
##########
@@ -0,0 +1,131 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+---
+
+aws_ecs_executor:
+  description: |
+    This section only applies if you are using the AwsEcsExecutor in
+    Airflow's ``[core]`` configuration.
+    For more information on any of these execution parameters, see the link 
below:
+    
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/ecs/client/run_task.html
+    For boto3 credential management, see
+    
https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html
+  options:
+    conn_id:
+      description: |
+        The Airflow connection (i.e. credentials) used by the ECS executor to 
make API calls to AWS ECS.
+      version_added: "2.8"

Review Comment:
   This should be the version of the provider this is added to, not the next 
core minor.



##########
airflow/providers/amazon/aws/executors/ecs/Setup_guide.md:
##########
@@ -0,0 +1,148 @@
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements.  See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership.  The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License.  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied.  See the License for the
+ specific language governing permissions and limitations
+ under the License.
+ -->
+
+# Setting up an ECS Executor for Apache Airflow
+
+There are 3 steps involved in getting an ECS Executor to work in Apache 
Airflow:
+
+1. Creating a database that Airflow and the Executor can connect to.

Review Comment:
   ```suggestion
   1. Creating a database that Airflow and the tasks running in ECS can connect 
to.
   ```



##########
airflow/providers/amazon/aws/executors/ecs/Dockerfile:
##########
@@ -0,0 +1,86 @@
+# hadolint ignore=DL3007
+FROM apache/airflow:latest
+USER root
+RUN apt-get update \
+  && apt-get install -y --no-install-recommends unzip \
+  # The below helps to keep the image size down
+  && apt-get clean \
+  && rm -rf /var/lib/apt/lists/*
+RUN    curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip"; -o 
"awscliv2.zip"
+RUN    unzip awscliv2.zip && ./aws/install

Review Comment:
   If you combine these into one step you can toss the zip file so it doesn't 
sit in a layer. Might also want to toss the expanded files too?



##########
airflow/providers/amazon/aws/executors/ecs/README.md:
##########
@@ -0,0 +1,196 @@
+<!--

Review Comment:
   Good doc. It should probably be moved to the providers real docs though, 
instead of being in the source here.



##########
airflow/providers/amazon/aws/executors/ecs/Dockerfile:
##########
@@ -0,0 +1,86 @@
+# hadolint ignore=DL3007
+FROM apache/airflow:latest
+USER root
+RUN apt-get update \
+  && apt-get install -y --no-install-recommends unzip \
+  # The below helps to keep the image size down
+  && apt-get clean \
+  && rm -rf /var/lib/apt/lists/*
+RUN    curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip"; -o 
"awscliv2.zip"
+RUN    unzip awscliv2.zip && ./aws/install
+
+# Add a script to run the aws s3 sync command when the container is run
+COPY <<"EOF" /entrypoint.sh
+#!/bin/bash
+
+echo "Downloading DAGs from S3 bucket"
+aws s3 sync "$S3_URL" "$CONTAINER_DAG_PATH"
+
+exec "$@"
+EOF
+
+RUN chmod +x /entrypoint.sh
+
+USER airflow
+
+## Installing Python Dependencies
+# Python dependencies can be installed by providing a requirements.txt.
+# If the file is in a different location, use the requirements_path build 
argument to specify
+# the file path.
+ARG requirements_path=./requirements.txt
+ENV REQUIREMENTS_PATH=$requirements_path
+
+# Uncomment the two lines below to copy the requirements.txt file to the 
container, and
+# install the dependencies.
+# COPY --chown=airflow:root $REQUIREMENTS_PATH /opt/airflow/requirements.txt
+# RUN pip install --no-cache-dir -r /opt/airflow/requirements.txt
+
+
+## AWS Authentication
+# The image requires access to AWS services. This Dockerfile supports 2 ways 
to authenticate with AWS.
+# The first is using build arguments where you can provide the AWS credentials 
as arguments
+# passed when building the image. The other option is to copy the ~/.aws 
folder to the container,
+# and authenticate using the credentials in that folder.
+# If you would like to use an alternative method of authentication, feel free 
to make the
+# necessary changes to this file.
+
+# Use these arguments to provide AWS authentication information
+ARG aws_access_key_id
+ARG aws_secret_access_key
+ARG aws_default_region
+ARG aws_session_token
+
+ENV AWS_ACCESS_KEY_ID=$aws_access_key_id

Review Comment:
   Not very familiar with ECS, but shouldn't we inject these somehow instead of 
baking creds into the image directly? That's a red flag pattern in my eyes.



##########
airflow/providers/amazon/aws/executors/ecs/Dockerfile:
##########
@@ -0,0 +1,86 @@
+# hadolint ignore=DL3007
+FROM apache/airflow:latest
+USER root
+RUN apt-get update \
+  && apt-get install -y --no-install-recommends unzip \
+  # The below helps to keep the image size down
+  && apt-get clean \
+  && rm -rf /var/lib/apt/lists/*
+RUN    curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip"; -o 
"awscliv2.zip"
+RUN    unzip awscliv2.zip && ./aws/install
+
+# Add a script to run the aws s3 sync command when the container is run
+COPY <<"EOF" /entrypoint.sh
+#!/bin/bash
+
+echo "Downloading DAGs from S3 bucket"
+aws s3 sync "$S3_URL" "$CONTAINER_DAG_PATH"
+
+exec "$@"
+EOF
+
+RUN chmod +x /entrypoint.sh
+
+USER airflow
+
+## Installing Python Dependencies
+# Python dependencies can be installed by providing a requirements.txt.
+# If the file is in a different location, use the requirements_path build 
argument to specify
+# the file path.
+ARG requirements_path=./requirements.txt
+ENV REQUIREMENTS_PATH=$requirements_path
+
+# Uncomment the two lines below to copy the requirements.txt file to the 
container, and
+# install the dependencies.
+# COPY --chown=airflow:root $REQUIREMENTS_PATH /opt/airflow/requirements.txt
+# RUN pip install --no-cache-dir -r /opt/airflow/requirements.txt
+
+
+## AWS Authentication
+# The image requires access to AWS services. This Dockerfile supports 2 ways 
to authenticate with AWS.
+# The first is using build arguments where you can provide the AWS credentials 
as arguments
+# passed when building the image. The other option is to copy the ~/.aws 
folder to the container,
+# and authenticate using the credentials in that folder.
+# If you would like to use an alternative method of authentication, feel free 
to make the
+# necessary changes to this file.
+
+# Use these arguments to provide AWS authentication information
+ARG aws_access_key_id
+ARG aws_secret_access_key
+ARG aws_default_region
+ARG aws_session_token
+
+ENV AWS_ACCESS_KEY_ID=$aws_access_key_id
+ENV AWS_SECRET_ACCESS_KEY=$aws_secret_access_key
+ENV AWS_DEFAULT_REGION=$aws_default_region
+ENV AWS_SESSION_TOKEN=$aws_session_token
+
+# Uncomment the line below to authenticate to AWS using the ~/.aws folder
+# Keep in mind the docker build context when placing .aws folder
+# COPY --chown=airflow:root ./.aws /home/airflow/.aws

Review Comment:
   This pattern also seems problematic 🤷‍♂️



##########
airflow/providers/amazon/aws/executors/ecs/Dockerfile:
##########
@@ -0,0 +1,86 @@
+# hadolint ignore=DL3007
+FROM apache/airflow:latest
+USER root
+RUN apt-get update \
+  && apt-get install -y --no-install-recommends unzip \
+  # The below helps to keep the image size down
+  && apt-get clean \
+  && rm -rf /var/lib/apt/lists/*
+RUN    curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip"; -o 
"awscliv2.zip"
+RUN    unzip awscliv2.zip && ./aws/install
+
+# Add a script to run the aws s3 sync command when the container is run
+COPY <<"EOF" /entrypoint.sh
+#!/bin/bash
+
+echo "Downloading DAGs from S3 bucket"
+aws s3 sync "$S3_URL" "$CONTAINER_DAG_PATH"
+
+exec "$@"
+EOF
+
+RUN chmod +x /entrypoint.sh
+
+USER airflow
+
+## Installing Python Dependencies
+# Python dependencies can be installed by providing a requirements.txt.
+# If the file is in a different location, use the requirements_path build 
argument to specify
+# the file path.
+ARG requirements_path=./requirements.txt
+ENV REQUIREMENTS_PATH=$requirements_path
+
+# Uncomment the two lines below to copy the requirements.txt file to the 
container, and
+# install the dependencies.
+# COPY --chown=airflow:root $REQUIREMENTS_PATH /opt/airflow/requirements.txt
+# RUN pip install --no-cache-dir -r /opt/airflow/requirements.txt
+
+
+## AWS Authentication
+# The image requires access to AWS services. This Dockerfile supports 2 ways 
to authenticate with AWS.
+# The first is using build arguments where you can provide the AWS credentials 
as arguments
+# passed when building the image. The other option is to copy the ~/.aws 
folder to the container,
+# and authenticate using the credentials in that folder.
+# If you would like to use an alternative method of authentication, feel free 
to make the
+# necessary changes to this file.
+
+# Use these arguments to provide AWS authentication information
+ARG aws_access_key_id
+ARG aws_secret_access_key
+ARG aws_default_region
+ARG aws_session_token
+
+ENV AWS_ACCESS_KEY_ID=$aws_access_key_id
+ENV AWS_SECRET_ACCESS_KEY=$aws_secret_access_key
+ENV AWS_DEFAULT_REGION=$aws_default_region
+ENV AWS_SESSION_TOKEN=$aws_session_token
+
+# Uncomment the line below to authenticate to AWS using the ~/.aws folder
+# Keep in mind the docker build context when placing .aws folder
+# COPY --chown=airflow:root ./.aws /home/airflow/.aws

Review Comment:
   This pattern also seems problematic 🤷‍♂️



##########
airflow/providers/amazon/aws/executors/ecs/Dockerfile:
##########
@@ -0,0 +1,86 @@
+# hadolint ignore=DL3007
+FROM apache/airflow:latest
+USER root
+RUN apt-get update \
+  && apt-get install -y --no-install-recommends unzip \
+  # The below helps to keep the image size down
+  && apt-get clean \
+  && rm -rf /var/lib/apt/lists/*
+RUN    curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip"; -o 
"awscliv2.zip"
+RUN    unzip awscliv2.zip && ./aws/install
+
+# Add a script to run the aws s3 sync command when the container is run
+COPY <<"EOF" /entrypoint.sh
+#!/bin/bash
+
+echo "Downloading DAGs from S3 bucket"
+aws s3 sync "$S3_URL" "$CONTAINER_DAG_PATH"
+
+exec "$@"
+EOF
+
+RUN chmod +x /entrypoint.sh
+
+USER airflow
+
+## Installing Python Dependencies
+# Python dependencies can be installed by providing a requirements.txt.
+# If the file is in a different location, use the requirements_path build 
argument to specify
+# the file path.
+ARG requirements_path=./requirements.txt
+ENV REQUIREMENTS_PATH=$requirements_path

Review Comment:
   Should these be commented out as well, like the section below?



##########
airflow/providers/amazon/aws/executors/ecs/Dockerfile:
##########
@@ -0,0 +1,86 @@
+# hadolint ignore=DL3007
+FROM apache/airflow:latest
+USER root
+RUN apt-get update \
+  && apt-get install -y --no-install-recommends unzip \
+  # The below helps to keep the image size down
+  && apt-get clean \
+  && rm -rf /var/lib/apt/lists/*
+RUN    curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip"; -o 
"awscliv2.zip"
+RUN    unzip awscliv2.zip && ./aws/install
+
+# Add a script to run the aws s3 sync command when the container is run
+COPY <<"EOF" /entrypoint.sh
+#!/bin/bash
+
+echo "Downloading DAGs from S3 bucket"
+aws s3 sync "$S3_URL" "$CONTAINER_DAG_PATH"
+
+exec "$@"
+EOF
+
+RUN chmod +x /entrypoint.sh
+
+USER airflow
+
+## Installing Python Dependencies
+# Python dependencies can be installed by providing a requirements.txt.
+# If the file is in a different location, use the requirements_path build 
argument to specify
+# the file path.
+ARG requirements_path=./requirements.txt
+ENV REQUIREMENTS_PATH=$requirements_path
+
+# Uncomment the two lines below to copy the requirements.txt file to the 
container, and
+# install the dependencies.
+# COPY --chown=airflow:root $REQUIREMENTS_PATH /opt/airflow/requirements.txt
+# RUN pip install --no-cache-dir -r /opt/airflow/requirements.txt
+
+
+## AWS Authentication
+# The image requires access to AWS services. This Dockerfile supports 2 ways 
to authenticate with AWS.
+# The first is using build arguments where you can provide the AWS credentials 
as arguments
+# passed when building the image. The other option is to copy the ~/.aws 
folder to the container,
+# and authenticate using the credentials in that folder.
+# If you would like to use an alternative method of authentication, feel free 
to make the
+# necessary changes to this file.
+
+# Use these arguments to provide AWS authentication information
+ARG aws_access_key_id
+ARG aws_secret_access_key
+ARG aws_default_region
+ARG aws_session_token
+
+ENV AWS_ACCESS_KEY_ID=$aws_access_key_id
+ENV AWS_SECRET_ACCESS_KEY=$aws_secret_access_key
+ENV AWS_DEFAULT_REGION=$aws_default_region
+ENV AWS_SESSION_TOKEN=$aws_session_token
+
+# Uncomment the line below to authenticate to AWS using the ~/.aws folder
+# Keep in mind the docker build context when placing .aws folder
+# COPY --chown=airflow:root ./.aws /home/airflow/.aws
+
+
+## Loading DAGs
+# This Dockerfile supports 2 ways to load DAGs onto the container.
+# One is to upload all the DAGs onto an S3 bucket, and then
+# download them onto the container. The other is to copy a local folder with
+# the DAGs onto the container.
+# If you would like to use an alternative method of loading DAGs, feel free to 
make the
+# necessary changes to this file.
+
+ARG host_dag_path=./dags
+ENV HOST_DAG_PATH=$host_dag_path
+# Set host_dag_path to the path of the DAGs on the host
+# COPY --chown=airflow:root $HOST_DAG_PATH $CONTAINER_DAG_PATH
+
+
+# If using S3 bucket as source of DAGs, uncommenting the next ENTRYPOINT 
command will overwrite this one.
+ENTRYPOINT []
+
+# Use these arguments to load DAGs onto the container from S3
+ARG s3_url
+ENV S3_URL=$s3_url
+ARG container_dag_path=/opt/airflow/dags
+ENV CONTAINER_DAG_PATH=$container_dag_path
+# Uncomment the line if using S3 bucket as the source of DAGs
+# ENTRYPOINT ["/entrypoint.sh"]

Review Comment:
   This would mean the OSS entrypoint is skipped. Should we wrap it instead?



##########
airflow/providers/amazon/aws/executors/ecs/README.md:
##########
@@ -0,0 +1,196 @@
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements.  See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership.  The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License.  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied.  See the License for the
+ specific language governing permissions and limitations
+ under the License.
+ -->
+
+# AWS ECS Executor
+
+This is an Airflow executor powered by Amazon Elastic Container Service (ECS). 
Each task that Airflow schedules for execution is run within its own ECS 
container. Some benefits of an executor like this include:
+
+1. Task isolation: No task can be a noisy neighbor for another. Resources like 
CPU, memory and disk are isolated to each individual task. Any actions or 
failures which affect networking or fail the entire container only affect the 
single task running in it. No single user can overload the environment by 
triggering too many tasks, because there are no shared workers.
+2. Customized environments: You can build different container images which 
incorporate specific dependencies (such as system level dependencies), 
binaries, or data required for a task to run.
+3. Cost effective: Compute resources only exist for the lifetime of the 
Airflow task itself. This saves costs by not requiring persistent/long lived 
workers ready at all times, which also need maintenance and patching.
+
+For a quick start guide please see [here](Setup_guide.md), it will get you up 
and running with a basic configuration.
+
+The below sections provide more generic details about configuration, the 
provided example Dockerfile and logging.
+
+## Config Options
+
+There are a number of configuration options available, which can either be set 
directly in the airflow.cfg
+file under an "aws_ecs_executor" section or via environment variables using 
the `AIRFLOW__AWS_ECS_EXECUTOR__<OPTION_NAME>`
+format, for example `AIRFLOW__AWS_ECS_EXECUTOR__CONTAINER_NAME = 
"myEcsContainer"`.  For more information
+on how to set these options, see [Setting Configuration 
Options](https://airflow.apache.org/docs/apache-airflow/stable/howto/set-config.html)
+
+In the case of conflicts, the order of precedence is:
+
+1. Load default values for options which have defaults.
+2. Load any values provided in the RUN_TASK_KWARGS option if one is provided.
+3. Load any values explicitly provided through airflow.cfg or environment 
variables. These are checked with Airflow's config precedence.
+
+### Required config options:
+
+- CLUSTER - Name of the Amazon ECS Cluster. Required.
+- CONTAINER_NAME - Name of the container that will be used to execute Airflow 
tasks via the ECS executor.
+The container should be specified in the ECS Task Definition. Required.
+- REGION - The name of the AWS Region where Amazon ECS is configured. Required.
+
+### Optional config options:
+
+- ASSIGN_PUBLIC_IP - "Whether to assign a public IP address to the containers 
launched by the ECS executor. Defaults to "False".
+- CONN_ID - The Airflow connection (i.e. credentials) used by the ECS executor 
to make API calls to AWS ECS. Defaults to "aws_default".
+- LAUNCH_TYPE - Launch type can either be 'FARGATE' OR 'EC2'.  Defaults to 
"FARGATE".
+- PLATFORM_VERSION - The platform version the ECS task uses if the FARGATE 
launch type is used. Defaults to "LATEST".
+- RUN_TASK_KWARGS - A JSON string containing arguments to provide the ECS 
`run_task` API.
+- SECURITY_GROUPS - Up to 5 comma-seperated security group IDs associated with 
the ECS task. Defaults to the VPC default.
+- SUBNETS - Up to 16 comma-separated subnet IDs associated with the ECS task 
or service. Defaults to the VPC default.
+- TASK_DEFINITION - The family and revision (family:revision) or full ARN of 
the ECS task definition to run. Defaults to the latest ACTIVE revision.
+- MAX_RUN_TASK_ATTEMPTS - The maximum number of times the Ecs Executor should 
attempt to run a task.
+
+For a more detailed description of available options, including type hints and 
examples, see the `config_templates` folder in the Amazon provider package.
+
+## Dockerfile for ECS Executor
+
+An example Dockerfile can be found [here](Dockerfile#), it creates an image 
that can be used on an ECS container to run Airflow tasks using the AWS ECS 
Executor in Apache Airflow. The image
+supports AWS CLI/API integration, allowing you to interact with AWS services 
within your Airflow environment. It also includes options to load DAGs 
(Directed Acyclic Graphs) from either an S3 bucket or a local folder.
+
+### Base Image
+
+The Docker image is built upon the `apache/airflow:latest` image. See 
[here](https://hub.docker.com/r/apache/airflow) for more information about the 
image.
+
+Important note: The python version in this image must match the python version 
on the host/container which is running the Airflow scheduler process (which in 
turn runs the executor). The python version of the image can be verified by 
running the container, and printing the python version as follows:
+
+```
+docker run <image_name> python --version
+```
+
+Ensure that this version matches the python version of the host/container 
which is running the Airflow scheduler process (and thus, the ECS executor.) 
Apache Airflow images with specific python versions can be downloaded from the 
Dockerhub registry, and filtering tags by the [python 
version](https://hub.docker.com/r/apache/airflow/tags?page=1&name=3.8). For 
example, the tag `latest-python3.8` specifies that the image will have python 
3.8 installed.
+
+### Prerequisites
+
+Docker must be installed on your system. Instructions for installing Docker 
can be found [here](https://docs.docker.com/get-docker/).
+
+### AWS Credentials
+
+The [AWS CLI](https://aws.amazon.com/cli/) is installed within the container, 
and there are multiple ways to pass AWS authentication information to the 
container. This guide will cover 2 methods.
+
+The first method is to use the build-time arguments (`aws_access_key_id`, 
`aws_secret_access_key`, `aws_default_region`, and `aws_session_token`).
+To pass AWS authentication information using these arguments, use the 
`--build-arg` option during the Docker build process. For example:
+
+```
+docker build -t my-airflow-image \
+ --build-arg aws_access_key_id=YOUR_ACCESS_KEY \
+ --build-arg aws_secret_access_key=YOUR_SECRET_KEY \
+ --build-arg aws_default_region=YOUR_DEFAULT_REGION \
+ --build-arg aws_session_token=YOUR_SESSION_TOKEN .
+```
+
+Replace `YOUR_ACCESS_KEY`, `YOUR_SECRET_KEY`, `YOUR_SESSION_TOKEN`, and 
`YOUR_DEFAULT_REGION` with valid AWS credentials.
+
+Alternatively, you can authenticate to AWS using the `~/.aws` folder. See 
instructions on how to generate this folder 
[here](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html).
 Uncomment the line in the Dockerfile to copy the `./.aws` folder from your 
host machine to the container's `/home/airflow/.aws` directory. Keep in mind 
the Docker build context when copying the `.aws` folder to the container.
+
+### Loading DAGs
+
+There are many ways to load DAGs on the ECS container. This Dockerfile is 
preconfigured with two possible ways: copying from a local folder, or 
downloading from an S3 bucket. Other methods of loading DAGs are possible as 
well.
+
+#### From S3 Bucket
+
+To load DAGs from an S3 bucket, uncomment the entrypoint line in the 
Dockerfile to synchronize the DAGs from the specified S3 bucket to the 
`/opt/airflow/dags` directory inside the container. You can optionally provide 
`container_dag_path` as a build argument if you want to store the DAGs in a 
directory other than `/opt/airflow/dags`.
+
+Add `--build-arg s3_url=YOUR_S3_URL` in the docker build command.
+Replace `YOUR_S3_URL` with the URL of your S3 bucket. Make sure you have the 
appropriate permissions to read from the bucket.
+
+Note that the following command is also passing in AWS credentials as build 
arguments.
+
+```
+docker build -t my-airflow-image \
+ --build-arg aws_access_key_id=YOUR_ACCESS_KEY \
+ --build-arg aws_secret_access_key=YOUR_SECRET_KEY \
+ --build-arg aws_default_region=YOUR_DEFAULT_REGION \
+ --build-arg aws_session_token=YOUR_SESSION_TOKEN \
+ --build-arg s3_url=YOUR_S3_URL .
+```
+
+
+#### From Local Folder
+
+To load DAGs from a local folder, place your DAG files in a folder within the 
docker build context on your host machine, and provide the location of the 
folder using the `host_dag_path` build argument. By default, the DAGs will be 
copied to `/opt/airflow/dags`, but this can be changed by passing the 
`container_dag_path` build-time argument during the Docker build process:
+
+```
+docker build -t my-airflow-image --build-arg host_dag_path=./dags_on_host 
--build-arg container_dag_path=/path/on/container .
+```
+
+If choosing to load DAGs onto a different path than `/opt/airflow/dags`, then 
the new path will need to be updated in the Airflow config.
+
+#### Mounting a Volume

Review Comment:
   Not sure why we have this section in the readme? What does volume mounting a 
dir in a local container have to do with the ECS executor?



##########
airflow/providers/amazon/aws/executors/ecs/README.md:
##########
@@ -0,0 +1,196 @@
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements.  See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership.  The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License.  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied.  See the License for the
+ specific language governing permissions and limitations
+ under the License.
+ -->
+
+# AWS ECS Executor
+
+This is an Airflow executor powered by Amazon Elastic Container Service (ECS). 
Each task that Airflow schedules for execution is run within its own ECS 
container. Some benefits of an executor like this include:
+
+1. Task isolation: No task can be a noisy neighbor for another. Resources like 
CPU, memory and disk are isolated to each individual task. Any actions or 
failures which affect networking or fail the entire container only affect the 
single task running in it. No single user can overload the environment by 
triggering too many tasks, because there are no shared workers.
+2. Customized environments: You can build different container images which 
incorporate specific dependencies (such as system level dependencies), 
binaries, or data required for a task to run.
+3. Cost effective: Compute resources only exist for the lifetime of the 
Airflow task itself. This saves costs by not requiring persistent/long lived 
workers ready at all times, which also need maintenance and patching.
+
+For a quick start guide please see [here](Setup_guide.md), it will get you up 
and running with a basic configuration.
+
+The below sections provide more generic details about configuration, the 
provided example Dockerfile and logging.
+
+## Config Options
+
+There are a number of configuration options available, which can either be set 
directly in the airflow.cfg
+file under an "aws_ecs_executor" section or via environment variables using 
the `AIRFLOW__AWS_ECS_EXECUTOR__<OPTION_NAME>`
+format, for example `AIRFLOW__AWS_ECS_EXECUTOR__CONTAINER_NAME = 
"myEcsContainer"`.  For more information
+on how to set these options, see [Setting Configuration 
Options](https://airflow.apache.org/docs/apache-airflow/stable/howto/set-config.html)
+
+In the case of conflicts, the order of precedence is:
+
+1. Load default values for options which have defaults.
+2. Load any values provided in the RUN_TASK_KWARGS option if one is provided.
+3. Load any values explicitly provided through airflow.cfg or environment 
variables. These are checked with Airflow's config precedence.
+
+### Required config options:
+
+- CLUSTER - Name of the Amazon ECS Cluster. Required.
+- CONTAINER_NAME - Name of the container that will be used to execute Airflow 
tasks via the ECS executor.
+The container should be specified in the ECS Task Definition. Required.
+- REGION - The name of the AWS Region where Amazon ECS is configured. Required.
+
+### Optional config options:
+
+- ASSIGN_PUBLIC_IP - "Whether to assign a public IP address to the containers 
launched by the ECS executor. Defaults to "False".
+- CONN_ID - The Airflow connection (i.e. credentials) used by the ECS executor 
to make API calls to AWS ECS. Defaults to "aws_default".
+- LAUNCH_TYPE - Launch type can either be 'FARGATE' OR 'EC2'.  Defaults to 
"FARGATE".
+- PLATFORM_VERSION - The platform version the ECS task uses if the FARGATE 
launch type is used. Defaults to "LATEST".
+- RUN_TASK_KWARGS - A JSON string containing arguments to provide the ECS 
`run_task` API.
+- SECURITY_GROUPS - Up to 5 comma-seperated security group IDs associated with 
the ECS task. Defaults to the VPC default.
+- SUBNETS - Up to 16 comma-separated subnet IDs associated with the ECS task 
or service. Defaults to the VPC default.
+- TASK_DEFINITION - The family and revision (family:revision) or full ARN of 
the ECS task definition to run. Defaults to the latest ACTIVE revision.
+- MAX_RUN_TASK_ATTEMPTS - The maximum number of times the Ecs Executor should 
attempt to run a task.
+
+For a more detailed description of available options, including type hints and 
examples, see the `config_templates` folder in the Amazon provider package.
+
+## Dockerfile for ECS Executor
+
+An example Dockerfile can be found [here](Dockerfile#), it creates an image 
that can be used on an ECS container to run Airflow tasks using the AWS ECS 
Executor in Apache Airflow. The image
+supports AWS CLI/API integration, allowing you to interact with AWS services 
within your Airflow environment. It also includes options to load DAGs 
(Directed Acyclic Graphs) from either an S3 bucket or a local folder.
+
+### Base Image
+
+The Docker image is built upon the `apache/airflow:latest` image. See 
[here](https://hub.docker.com/r/apache/airflow) for more information about the 
image.
+
+Important note: The python version in this image must match the python version 
on the host/container which is running the Airflow scheduler process (which in 
turn runs the executor). The python version of the image can be verified by 
running the container, and printing the python version as follows:
+
+```
+docker run <image_name> python --version
+```
+
+Ensure that this version matches the python version of the host/container 
which is running the Airflow scheduler process (and thus, the ECS executor.) 
Apache Airflow images with specific python versions can be downloaded from the 
Dockerhub registry, and filtering tags by the [python 
version](https://hub.docker.com/r/apache/airflow/tags?page=1&name=3.8). For 
example, the tag `latest-python3.8` specifies that the image will have python 
3.8 installed.
+
+### Prerequisites
+
+Docker must be installed on your system. Instructions for installing Docker 
can be found [here](https://docs.docker.com/get-docker/).
+
+### AWS Credentials
+
+The [AWS CLI](https://aws.amazon.com/cli/) is installed within the container, 
and there are multiple ways to pass AWS authentication information to the 
container. This guide will cover 2 methods.
+
+The first method is to use the build-time arguments (`aws_access_key_id`, 
`aws_secret_access_key`, `aws_default_region`, and `aws_session_token`).
+To pass AWS authentication information using these arguments, use the 
`--build-arg` option during the Docker build process. For example:
+
+```
+docker build -t my-airflow-image \
+ --build-arg aws_access_key_id=YOUR_ACCESS_KEY \
+ --build-arg aws_secret_access_key=YOUR_SECRET_KEY \
+ --build-arg aws_default_region=YOUR_DEFAULT_REGION \
+ --build-arg aws_session_token=YOUR_SESSION_TOKEN .
+```
+
+Replace `YOUR_ACCESS_KEY`, `YOUR_SECRET_KEY`, `YOUR_SESSION_TOKEN`, and 
`YOUR_DEFAULT_REGION` with valid AWS credentials.
+
+Alternatively, you can authenticate to AWS using the `~/.aws` folder. See 
instructions on how to generate this folder 
[here](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html).
 Uncomment the line in the Dockerfile to copy the `./.aws` folder from your 
host machine to the container's `/home/airflow/.aws` directory. Keep in mind 
the Docker build context when copying the `.aws` folder to the container.
+
+### Loading DAGs
+
+There are many ways to load DAGs on the ECS container. This Dockerfile is 
preconfigured with two possible ways: copying from a local folder, or 
downloading from an S3 bucket. Other methods of loading DAGs are possible as 
well.
+
+#### From S3 Bucket
+
+To load DAGs from an S3 bucket, uncomment the entrypoint line in the 
Dockerfile to synchronize the DAGs from the specified S3 bucket to the 
`/opt/airflow/dags` directory inside the container. You can optionally provide 
`container_dag_path` as a build argument if you want to store the DAGs in a 
directory other than `/opt/airflow/dags`.
+
+Add `--build-arg s3_url=YOUR_S3_URL` in the docker build command.
+Replace `YOUR_S3_URL` with the URL of your S3 bucket. Make sure you have the 
appropriate permissions to read from the bucket.
+
+Note that the following command is also passing in AWS credentials as build 
arguments.
+
+```
+docker build -t my-airflow-image \
+ --build-arg aws_access_key_id=YOUR_ACCESS_KEY \
+ --build-arg aws_secret_access_key=YOUR_SECRET_KEY \
+ --build-arg aws_default_region=YOUR_DEFAULT_REGION \
+ --build-arg aws_session_token=YOUR_SESSION_TOKEN \
+ --build-arg s3_url=YOUR_S3_URL .
+```
+
+
+#### From Local Folder
+
+To load DAGs from a local folder, place your DAG files in a folder within the 
docker build context on your host machine, and provide the location of the 
folder using the `host_dag_path` build argument. By default, the DAGs will be 
copied to `/opt/airflow/dags`, but this can be changed by passing the 
`container_dag_path` build-time argument during the Docker build process:
+
+```
+docker build -t my-airflow-image --build-arg host_dag_path=./dags_on_host 
--build-arg container_dag_path=/path/on/container .
+```
+
+If choosing to load DAGs onto a different path than `/opt/airflow/dags`, then 
the new path will need to be updated in the Airflow config.
+
+#### Mounting a Volume
+
+You can optionally mount a local directory as a volume on the container during 
run-time. This will allow you to make change to files on the mounted directory, 
and have those changes be reflected in the container. To do this, run the 
following command:
+
+```
+docker run --volume /abs/path/to/local/dir:/abs/path/to/remote/dir <image_name>
+```
+
+Note: Doing this will overwrite the contents of the directory on the container 
with the contents of the local directory.
+
+### Installing Python Dependencies
+
+This Dockerfile supports installing Python dependencies via `pip` from a 
`requirements.txt` file. Place your `requirements.txt` file in the same 
directory as the Dockerfile. If it is in a different location, it can be 
specified using the `requirements_path` build-argument. Keep in mind the Docker 
context when copying the `requirements.txt` file. Uncomment the two appropriate 
lines in the Dockerfile that copy the `requirements.txt` file to the container, 
and run `pip install` to install the dependencies on the container.
+
+### Building Image for ECS Executor
+
+Detailed instructions on how to use the Docker image, that you have created 
via this readme, with the ECS Executor can be found 
[here](link_to_how_to_guide).
+
+
+## Logging
+
+Airflow tasks executed via this executor run in ECS containers within the 
configured VPC. This means that logs are not directly accessible to the Airflow 
Webserver and when containers are stopped, after task completion, the logs 
would be permanently lost.
+
+Remote logging can be employed when using the ECS executor to persist your 
Airflow Task logs and make them viewable from the Airflow Webserver.
+
+### Configuring Remote Logging
+
+There are many ways to configure remote logging and several supported 
destinations. A general overview of Airflow Task logging can be found 
[here](https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/logging-monitoring/logging-tasks.html).
 Instructions for configuring S3 remote logging can be found 
[here](https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/logging/s3-task-handler.html)
 and Cloudwatch remote logging 
[here](https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/logging/cloud-watch-task-handlers.html).
+Some important things to point out for remote logging in the context of the 
ECS executor:
+
+ - The configuration options for Airflow remote logging must be configured on 
the host running the Airflow Webserver (so that it can fetch logs from the 
remote location) as well as within the ECS container running the Airflow Tasks 
(so that it can upload the logs to the remote location). See 
[here](https://airflow.apache.org/docs/apache-airflow/stable/howto/set-config.html)
 to read more about how to set Airflow configuration via config file or 
environment variable exports.

Review Comment:
   Config should be consistent everywhere, not sure why we need to call this 
out explicity. See this in the [config 
ref](https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html):
   
   > Use the same configuration across all the Airflow components.



##########
airflow/providers/amazon/aws/executors/ecs/Setup_guide.md:
##########
@@ -0,0 +1,148 @@
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements.  See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership.  The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License.  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied.  See the License for the
+ specific language governing permissions and limitations
+ under the License.
+ -->
+
+# Setting up an ECS Executor for Apache Airflow
+
+There are 3 steps involved in getting an ECS Executor to work in Apache 
Airflow:
+
+1. Creating a database that Airflow and the Executor can connect to.
+2. Creating and configuring an ECS Cluster that can run tasks from Airflow.
+3. Configuring Airflow to use the ECS Executor and the database.
+
+There are different options for selecting a database backend. See 
[here](https://airflow.apache.org/docs/apache-airflow/stable/howto/set-up-database.html)
 for more information about the different options supported by Airflow. The 
following guide will explain how to set up a PostgreSQL RDS Instance on AWS. 
The guide will also cover setting up an ECS cluster. The ECS Executor supports 
various launch types, but this guide will explain how to set up an ECS Fargate 
cluster.
+
+## Setting up an RDS DB Instance for ECS Executors
+
+### Create the RDS DB Instance
+
+1. Log in to your AWS Management Console and navigate to the RDS service.
+2. Click "Create database" to start creating a new RDS instance.
+3. Choose the "Standard create" option, and select PostreSQL.
+4. Select the appropriate template, availability and durability.
+   - NOTE: At the time of this writing, the "Multi-AZ DB **Cluster**" option 
does not support setting the database name, which is a required step below.
+5. Set the DB Instance name, the username and password.
+7. Choose the instance configuration, and storage parameters.
+8. In the Connectivity section, select Don't connect to an EC2 compute resource
+9. Select or create a VPC and subnet, and allow public access to the DB. 
Select or create security group and select the Availability Zone.
+10. Open the Additional Configuration tab and set the database name to 
`airflow_db`.
+11. Select other settings as required, and create the database by clicking 
Create database.
+
+
+### Test Connectivity
+
+In order to be able to connect to the new RDS instance, you need to allow 
inbound traffic to the database from your IP address.
+
+
+1. Under the "Security" heading in the "Connectivity & security" tab of the 
RDS instance, find the link to the VPC security group for your new RDS DB 
instance.
+2. Create an inbound rule that allows traffic from your IP address(es) on TCP 
port 5432 (PostgreSQL).
+
+3. Confirm that you can connect to the DB after modifying the security group. 
This will require having `psql` installed. Instructions for installing `psql` 
can be found [here](https://www.postgresql.org/download/).
+
+**NOTE**: Be sure that the status of your DB is Available before testing 
connectivity
+
+```
+psql -h <endpoint> -p 5432 -U <username> <db_name>
+```
+
+The endpoint can be found on the "Connectivity and Security" tab, the username 
(and password) are the credentials used when creating the database.
+The db_name should be `airflow_db` (unless a different one was used when 
creating the database.)
+
+You will be prompted to enter the password if the connection is successful.
+
+
+## Creating an ECS Cluster with Fargate, and Task Definitions
+
+In order to create a Task Definition for the ECS Cluster that will work with 
Apache Airflow, you will need a Docker image that is properly configured. See 
the [Dockerfile](README.md#dockerfile-for-ecs-executor) section for 
instructions on how to do that.
+
+Once the image is built, it needs to be put in a repository where it can be 
pulled by ECS. There are multiple ways to accomplish this. This guide will go 
over doing this using Amazon Elastic Container Registry (ECR).
+
+### Create an ECR Repository
+
+1. Log in to your AWS Management Console and navigate to the ECR service.
+2. Click Create repository.
+3. Name the repository and fill out other information as required.
+4. Click Create Repository.
+5. Once the repository has been created, click on the repository. Click on the 
"View push commands" button on the top right.
+6. Follow the instructions to push the Docker image, replacing image names as 
appropriate. Ensure the image is uploaded by refreshing the page once the image 
is pushed.
+
+### Create ECS Cluster
+
+1. Log in to your AWS Management Console and navigate to the Amazon Elastic 
Container Service.
+2. Click "Clusters" then click "Create Cluster".
+3. Make sure that AWS Fargate (Serverless) is selected under Infrastructure.
+4. Select other options as required and click Create to create the cluster.
+
+### Create Task Definition
+
+1. Click on Task Definitions on the left hand bar, and click Create new task 
definition.
+2. Choose the Task Definition Family name. Select AWS Fargate for the Launch 
Type.
+3. Select or create the Task Role and Task Execution Role, and ensure the 
roles have the required permissions to accomplish their respective tasks. You 
can choose to create a new Task Execution role that will have the basic minimum 
permissions in order for the task to run.
+4. Select a name for the container, and use the image URI of the image that 
was pushed in the previous section. Make sure the role being used has the 
required permissions to pull the image.
+5. Add the following environment variables to the container:
+
+ - `AIRFLOW__DATABASE__SQL_ALCHEMY_CONN`, with the value being the PostgreSQL 
connection string in the following format using the values set during the 
[Database section](#create-the-rds-db-instance) above:
+
+```
+postgresql+psycopg2://<username>:<password>@<endpoint>/<database_name>
+```
+
+ - `AIRFLOW__ECS_EXECUTOR__SECURITY_GROUPS`, with the value being a comma 
separated list of security group IDs associated with the VPC used for the RDS 
instance.
+ - `AIRFLOW__ECS_EXECUTOR__SUBNETS`, with the value being a comma separated 
list of subnet IDs of the subnets associated with the RDS instance.
+
+6. Add other configuration as necessary for Airflow generally ([see 
here](https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html)),
 the ECS executor ([see here](README.md#config-options)) or for remote logging 
([see here](README.md#logging)).

Review Comment:
   Yeah, this is the section that needs to be revamped to make sure config is 
consistent everywhere!



##########
airflow/providers/amazon/aws/config_templates/config.yml:
##########
@@ -0,0 +1,131 @@
+# Licensed to the Apache Software Foundation (ASF) under one

Review Comment:
   Isn't this meant to be 
[here](https://github.com/apache/airflow/blob/8ecd576de1043dbea40e5e16b5dc34859cc41725/airflow/providers/amazon/provider.yaml#L717)
 instead?
   



##########
airflow/providers/amazon/aws/executors/ecs/Dockerfile:
##########
@@ -0,0 +1,86 @@
+# hadolint ignore=DL3007
+FROM apache/airflow:latest

Review Comment:
   We might consider a different home for this? It's not obvious this is an 
example to me. Maybe docs, or at least a different path that'll better reflect 
it?
   
   e.g. 
https://github.com/apache/airflow/blob/main/docs/apache-airflow/howto/docker-compose/docker-compose.yaml



##########
airflow/providers/amazon/aws/executors/ecs/README.md:
##########
@@ -0,0 +1,196 @@
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements.  See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership.  The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License.  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied.  See the License for the
+ specific language governing permissions and limitations
+ under the License.
+ -->
+
+# AWS ECS Executor
+
+This is an Airflow executor powered by Amazon Elastic Container Service (ECS). 
Each task that Airflow schedules for execution is run within its own ECS 
container. Some benefits of an executor like this include:
+
+1. Task isolation: No task can be a noisy neighbor for another. Resources like 
CPU, memory and disk are isolated to each individual task. Any actions or 
failures which affect networking or fail the entire container only affect the 
single task running in it. No single user can overload the environment by 
triggering too many tasks, because there are no shared workers.
+2. Customized environments: You can build different container images which 
incorporate specific dependencies (such as system level dependencies), 
binaries, or data required for a task to run.
+3. Cost effective: Compute resources only exist for the lifetime of the 
Airflow task itself. This saves costs by not requiring persistent/long lived 
workers ready at all times, which also need maintenance and patching.
+
+For a quick start guide please see [here](Setup_guide.md), it will get you up 
and running with a basic configuration.
+
+The below sections provide more generic details about configuration, the 
provided example Dockerfile and logging.
+
+## Config Options
+
+There are a number of configuration options available, which can either be set 
directly in the airflow.cfg
+file under an "aws_ecs_executor" section or via environment variables using 
the `AIRFLOW__AWS_ECS_EXECUTOR__<OPTION_NAME>`
+format, for example `AIRFLOW__AWS_ECS_EXECUTOR__CONTAINER_NAME = 
"myEcsContainer"`.  For more information
+on how to set these options, see [Setting Configuration 
Options](https://airflow.apache.org/docs/apache-airflow/stable/howto/set-config.html)
+
+In the case of conflicts, the order of precedence is:
+
+1. Load default values for options which have defaults.
+2. Load any values provided in the RUN_TASK_KWARGS option if one is provided.
+3. Load any values explicitly provided through airflow.cfg or environment 
variables. These are checked with Airflow's config precedence.
+
+### Required config options:
+
+- CLUSTER - Name of the Amazon ECS Cluster. Required.
+- CONTAINER_NAME - Name of the container that will be used to execute Airflow 
tasks via the ECS executor.
+The container should be specified in the ECS Task Definition. Required.
+- REGION - The name of the AWS Region where Amazon ECS is configured. Required.
+
+### Optional config options:
+
+- ASSIGN_PUBLIC_IP - "Whether to assign a public IP address to the containers 
launched by the ECS executor. Defaults to "False".
+- CONN_ID - The Airflow connection (i.e. credentials) used by the ECS executor 
to make API calls to AWS ECS. Defaults to "aws_default".
+- LAUNCH_TYPE - Launch type can either be 'FARGATE' OR 'EC2'.  Defaults to 
"FARGATE".
+- PLATFORM_VERSION - The platform version the ECS task uses if the FARGATE 
launch type is used. Defaults to "LATEST".
+- RUN_TASK_KWARGS - A JSON string containing arguments to provide the ECS 
`run_task` API.
+- SECURITY_GROUPS - Up to 5 comma-seperated security group IDs associated with 
the ECS task. Defaults to the VPC default.
+- SUBNETS - Up to 16 comma-separated subnet IDs associated with the ECS task 
or service. Defaults to the VPC default.
+- TASK_DEFINITION - The family and revision (family:revision) or full ARN of 
the ECS task definition to run. Defaults to the latest ACTIVE revision.
+- MAX_RUN_TASK_ATTEMPTS - The maximum number of times the Ecs Executor should 
attempt to run a task.
+
+For a more detailed description of available options, including type hints and 
examples, see the `config_templates` folder in the Amazon provider package.
+
+## Dockerfile for ECS Executor
+
+An example Dockerfile can be found [here](Dockerfile#), it creates an image 
that can be used on an ECS container to run Airflow tasks using the AWS ECS 
Executor in Apache Airflow. The image
+supports AWS CLI/API integration, allowing you to interact with AWS services 
within your Airflow environment. It also includes options to load DAGs 
(Directed Acyclic Graphs) from either an S3 bucket or a local folder.
+
+### Base Image
+
+The Docker image is built upon the `apache/airflow:latest` image. See 
[here](https://hub.docker.com/r/apache/airflow) for more information about the 
image.
+
+Important note: The python version in this image must match the python version 
on the host/container which is running the Airflow scheduler process (which in 
turn runs the executor). The python version of the image can be verified by 
running the container, and printing the python version as follows:
+
+```
+docker run <image_name> python --version
+```
+
+Ensure that this version matches the python version of the host/container 
which is running the Airflow scheduler process (and thus, the ECS executor.) 
Apache Airflow images with specific python versions can be downloaded from the 
Dockerhub registry, and filtering tags by the [python 
version](https://hub.docker.com/r/apache/airflow/tags?page=1&name=3.8). For 
example, the tag `latest-python3.8` specifies that the image will have python 
3.8 installed.
+
+### Prerequisites
+
+Docker must be installed on your system. Instructions for installing Docker 
can be found [here](https://docs.docker.com/get-docker/).
+
+### AWS Credentials
+
+The [AWS CLI](https://aws.amazon.com/cli/) is installed within the container, 
and there are multiple ways to pass AWS authentication information to the 
container. This guide will cover 2 methods.
+
+The first method is to use the build-time arguments (`aws_access_key_id`, 
`aws_secret_access_key`, `aws_default_region`, and `aws_session_token`).
+To pass AWS authentication information using these arguments, use the 
`--build-arg` option during the Docker build process. For example:
+
+```
+docker build -t my-airflow-image \
+ --build-arg aws_access_key_id=YOUR_ACCESS_KEY \
+ --build-arg aws_secret_access_key=YOUR_SECRET_KEY \
+ --build-arg aws_default_region=YOUR_DEFAULT_REGION \
+ --build-arg aws_session_token=YOUR_SESSION_TOKEN .
+```
+
+Replace `YOUR_ACCESS_KEY`, `YOUR_SECRET_KEY`, `YOUR_SESSION_TOKEN`, and 
`YOUR_DEFAULT_REGION` with valid AWS credentials.
+
+Alternatively, you can authenticate to AWS using the `~/.aws` folder. See 
instructions on how to generate this folder 
[here](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html).
 Uncomment the line in the Dockerfile to copy the `./.aws` folder from your 
host machine to the container's `/home/airflow/.aws` directory. Keep in mind 
the Docker build context when copying the `.aws` folder to the container.
+
+### Loading DAGs
+
+There are many ways to load DAGs on the ECS container. This Dockerfile is 
preconfigured with two possible ways: copying from a local folder, or 
downloading from an S3 bucket. Other methods of loading DAGs are possible as 
well.
+
+#### From S3 Bucket
+
+To load DAGs from an S3 bucket, uncomment the entrypoint line in the 
Dockerfile to synchronize the DAGs from the specified S3 bucket to the 
`/opt/airflow/dags` directory inside the container. You can optionally provide 
`container_dag_path` as a build argument if you want to store the DAGs in a 
directory other than `/opt/airflow/dags`.
+
+Add `--build-arg s3_url=YOUR_S3_URL` in the docker build command.
+Replace `YOUR_S3_URL` with the URL of your S3 bucket. Make sure you have the 
appropriate permissions to read from the bucket.
+
+Note that the following command is also passing in AWS credentials as build 
arguments.
+
+```
+docker build -t my-airflow-image \
+ --build-arg aws_access_key_id=YOUR_ACCESS_KEY \
+ --build-arg aws_secret_access_key=YOUR_SECRET_KEY \
+ --build-arg aws_default_region=YOUR_DEFAULT_REGION \
+ --build-arg aws_session_token=YOUR_SESSION_TOKEN \
+ --build-arg s3_url=YOUR_S3_URL .
+```
+
+
+#### From Local Folder
+
+To load DAGs from a local folder, place your DAG files in a folder within the 
docker build context on your host machine, and provide the location of the 
folder using the `host_dag_path` build argument. By default, the DAGs will be 
copied to `/opt/airflow/dags`, but this can be changed by passing the 
`container_dag_path` build-time argument during the Docker build process:
+
+```
+docker build -t my-airflow-image --build-arg host_dag_path=./dags_on_host 
--build-arg container_dag_path=/path/on/container .
+```
+
+If choosing to load DAGs onto a different path than `/opt/airflow/dags`, then 
the new path will need to be updated in the Airflow config.
+
+#### Mounting a Volume
+
+You can optionally mount a local directory as a volume on the container during 
run-time. This will allow you to make change to files on the mounted directory, 
and have those changes be reflected in the container. To do this, run the 
following command:
+
+```
+docker run --volume /abs/path/to/local/dir:/abs/path/to/remote/dir <image_name>
+```
+
+Note: Doing this will overwrite the contents of the directory on the container 
with the contents of the local directory.
+
+### Installing Python Dependencies
+
+This Dockerfile supports installing Python dependencies via `pip` from a 
`requirements.txt` file. Place your `requirements.txt` file in the same 
directory as the Dockerfile. If it is in a different location, it can be 
specified using the `requirements_path` build-argument. Keep in mind the Docker 
context when copying the `requirements.txt` file. Uncomment the two appropriate 
lines in the Dockerfile that copy the `requirements.txt` file to the container, 
and run `pip install` to install the dependencies on the container.
+
+### Building Image for ECS Executor
+
+Detailed instructions on how to use the Docker image, that you have created 
via this readme, with the ECS Executor can be found 
[here](link_to_how_to_guide).
+
+
+## Logging
+
+Airflow tasks executed via this executor run in ECS containers within the 
configured VPC. This means that logs are not directly accessible to the Airflow 
Webserver and when containers are stopped, after task completion, the logs 
would be permanently lost.
+
+Remote logging can be employed when using the ECS executor to persist your 
Airflow Task logs and make them viewable from the Airflow Webserver.
+
+### Configuring Remote Logging
+
+There are many ways to configure remote logging and several supported 
destinations. A general overview of Airflow Task logging can be found 
[here](https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/logging-monitoring/logging-tasks.html).
 Instructions for configuring S3 remote logging can be found 
[here](https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/logging/s3-task-handler.html)
 and Cloudwatch remote logging 
[here](https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/logging/cloud-watch-task-handlers.html).
+Some important things to point out for remote logging in the context of the 
ECS executor:
+
+ - The configuration options for Airflow remote logging must be configured on 
the host running the Airflow Webserver (so that it can fetch logs from the 
remote location) as well as within the ECS container running the Airflow Tasks 
(so that it can upload the logs to the remote location). See 
[here](https://airflow.apache.org/docs/apache-airflow/stable/howto/set-config.html)
 to read more about how to set Airflow configuration via config file or 
environment variable exports.
+ - Adding the Airflow remote logging config to the container can be done in 
many ways. Some examples include, but are not limited to:

Review Comment:
   Related to above, we should set the expectation that however Airflow overall 
is configured, is also used for these ECS containers. Suggesting bespoke config 
for the tasks is asking for problems eventually imo.



##########
airflow/providers/amazon/aws/executors/ecs/README.md:
##########
@@ -0,0 +1,196 @@
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements.  See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership.  The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License.  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied.  See the License for the
+ specific language governing permissions and limitations
+ under the License.
+ -->
+
+# AWS ECS Executor
+
+This is an Airflow executor powered by Amazon Elastic Container Service (ECS). 
Each task that Airflow schedules for execution is run within its own ECS 
container. Some benefits of an executor like this include:
+
+1. Task isolation: No task can be a noisy neighbor for another. Resources like 
CPU, memory and disk are isolated to each individual task. Any actions or 
failures which affect networking or fail the entire container only affect the 
single task running in it. No single user can overload the environment by 
triggering too many tasks, because there are no shared workers.
+2. Customized environments: You can build different container images which 
incorporate specific dependencies (such as system level dependencies), 
binaries, or data required for a task to run.
+3. Cost effective: Compute resources only exist for the lifetime of the 
Airflow task itself. This saves costs by not requiring persistent/long lived 
workers ready at all times, which also need maintenance and patching.
+
+For a quick start guide please see [here](Setup_guide.md), it will get you up 
and running with a basic configuration.
+
+The below sections provide more generic details about configuration, the 
provided example Dockerfile and logging.
+
+## Config Options
+
+There are a number of configuration options available, which can either be set 
directly in the airflow.cfg
+file under an "aws_ecs_executor" section or via environment variables using 
the `AIRFLOW__AWS_ECS_EXECUTOR__<OPTION_NAME>`
+format, for example `AIRFLOW__AWS_ECS_EXECUTOR__CONTAINER_NAME = 
"myEcsContainer"`.  For more information
+on how to set these options, see [Setting Configuration 
Options](https://airflow.apache.org/docs/apache-airflow/stable/howto/set-config.html)
+
+In the case of conflicts, the order of precedence is:
+
+1. Load default values for options which have defaults.
+2. Load any values provided in the RUN_TASK_KWARGS option if one is provided.
+3. Load any values explicitly provided through airflow.cfg or environment 
variables. These are checked with Airflow's config precedence.
+
+### Required config options:
+
+- CLUSTER - Name of the Amazon ECS Cluster. Required.
+- CONTAINER_NAME - Name of the container that will be used to execute Airflow 
tasks via the ECS executor.
+The container should be specified in the ECS Task Definition. Required.
+- REGION - The name of the AWS Region where Amazon ECS is configured. Required.
+
+### Optional config options:
+
+- ASSIGN_PUBLIC_IP - "Whether to assign a public IP address to the containers 
launched by the ECS executor. Defaults to "False".
+- CONN_ID - The Airflow connection (i.e. credentials) used by the ECS executor 
to make API calls to AWS ECS. Defaults to "aws_default".
+- LAUNCH_TYPE - Launch type can either be 'FARGATE' OR 'EC2'.  Defaults to 
"FARGATE".
+- PLATFORM_VERSION - The platform version the ECS task uses if the FARGATE 
launch type is used. Defaults to "LATEST".
+- RUN_TASK_KWARGS - A JSON string containing arguments to provide the ECS 
`run_task` API.
+- SECURITY_GROUPS - Up to 5 comma-seperated security group IDs associated with 
the ECS task. Defaults to the VPC default.
+- SUBNETS - Up to 16 comma-separated subnet IDs associated with the ECS task 
or service. Defaults to the VPC default.
+- TASK_DEFINITION - The family and revision (family:revision) or full ARN of 
the ECS task definition to run. Defaults to the latest ACTIVE revision.
+- MAX_RUN_TASK_ATTEMPTS - The maximum number of times the Ecs Executor should 
attempt to run a task.
+
+For a more detailed description of available options, including type hints and 
examples, see the `config_templates` folder in the Amazon provider package.
+
+## Dockerfile for ECS Executor
+
+An example Dockerfile can be found [here](Dockerfile#), it creates an image 
that can be used on an ECS container to run Airflow tasks using the AWS ECS 
Executor in Apache Airflow. The image
+supports AWS CLI/API integration, allowing you to interact with AWS services 
within your Airflow environment. It also includes options to load DAGs 
(Directed Acyclic Graphs) from either an S3 bucket or a local folder.
+
+### Base Image
+
+The Docker image is built upon the `apache/airflow:latest` image. See 
[here](https://hub.docker.com/r/apache/airflow) for more information about the 
image.
+
+Important note: The python version in this image must match the python version 
on the host/container which is running the Airflow scheduler process (which in 
turn runs the executor). The python version of the image can be verified by 
running the container, and printing the python version as follows:
+
+```
+docker run <image_name> python --version
+```
+
+Ensure that this version matches the python version of the host/container 
which is running the Airflow scheduler process (and thus, the ECS executor.) 
Apache Airflow images with specific python versions can be downloaded from the 
Dockerhub registry, and filtering tags by the [python 
version](https://hub.docker.com/r/apache/airflow/tags?page=1&name=3.8). For 
example, the tag `latest-python3.8` specifies that the image will have python 
3.8 installed.
+
+### Prerequisites
+
+Docker must be installed on your system. Instructions for installing Docker 
can be found [here](https://docs.docker.com/get-docker/).
+
+### AWS Credentials
+
+The [AWS CLI](https://aws.amazon.com/cli/) is installed within the container, 
and there are multiple ways to pass AWS authentication information to the 
container. This guide will cover 2 methods.
+
+The first method is to use the build-time arguments (`aws_access_key_id`, 
`aws_secret_access_key`, `aws_default_region`, and `aws_session_token`).
+To pass AWS authentication information using these arguments, use the 
`--build-arg` option during the Docker build process. For example:
+
+```
+docker build -t my-airflow-image \
+ --build-arg aws_access_key_id=YOUR_ACCESS_KEY \
+ --build-arg aws_secret_access_key=YOUR_SECRET_KEY \
+ --build-arg aws_default_region=YOUR_DEFAULT_REGION \
+ --build-arg aws_session_token=YOUR_SESSION_TOKEN .
+```
+
+Replace `YOUR_ACCESS_KEY`, `YOUR_SECRET_KEY`, `YOUR_SESSION_TOKEN`, and 
`YOUR_DEFAULT_REGION` with valid AWS credentials.
+
+Alternatively, you can authenticate to AWS using the `~/.aws` folder. See 
instructions on how to generate this folder 
[here](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html).
 Uncomment the line in the Dockerfile to copy the `./.aws` folder from your 
host machine to the container's `/home/airflow/.aws` directory. Keep in mind 
the Docker build context when copying the `.aws` folder to the container.
+
+### Loading DAGs
+
+There are many ways to load DAGs on the ECS container. This Dockerfile is 
preconfigured with two possible ways: copying from a local folder, or 
downloading from an S3 bucket. Other methods of loading DAGs are possible as 
well.
+
+#### From S3 Bucket
+
+To load DAGs from an S3 bucket, uncomment the entrypoint line in the 
Dockerfile to synchronize the DAGs from the specified S3 bucket to the 
`/opt/airflow/dags` directory inside the container. You can optionally provide 
`container_dag_path` as a build argument if you want to store the DAGs in a 
directory other than `/opt/airflow/dags`.
+
+Add `--build-arg s3_url=YOUR_S3_URL` in the docker build command.
+Replace `YOUR_S3_URL` with the URL of your S3 bucket. Make sure you have the 
appropriate permissions to read from the bucket.
+
+Note that the following command is also passing in AWS credentials as build 
arguments.
+
+```
+docker build -t my-airflow-image \
+ --build-arg aws_access_key_id=YOUR_ACCESS_KEY \
+ --build-arg aws_secret_access_key=YOUR_SECRET_KEY \
+ --build-arg aws_default_region=YOUR_DEFAULT_REGION \
+ --build-arg aws_session_token=YOUR_SESSION_TOKEN \
+ --build-arg s3_url=YOUR_S3_URL .
+```
+
+
+#### From Local Folder
+
+To load DAGs from a local folder, place your DAG files in a folder within the 
docker build context on your host machine, and provide the location of the 
folder using the `host_dag_path` build argument. By default, the DAGs will be 
copied to `/opt/airflow/dags`, but this can be changed by passing the 
`container_dag_path` build-time argument during the Docker build process:
+
+```
+docker build -t my-airflow-image --build-arg host_dag_path=./dags_on_host 
--build-arg container_dag_path=/path/on/container .
+```
+
+If choosing to load DAGs onto a different path than `/opt/airflow/dags`, then 
the new path will need to be updated in the Airflow config.
+
+#### Mounting a Volume
+
+You can optionally mount a local directory as a volume on the container during 
run-time. This will allow you to make change to files on the mounted directory, 
and have those changes be reflected in the container. To do this, run the 
following command:
+
+```
+docker run --volume /abs/path/to/local/dir:/abs/path/to/remote/dir <image_name>
+```
+
+Note: Doing this will overwrite the contents of the directory on the container 
with the contents of the local directory.
+
+### Installing Python Dependencies
+
+This Dockerfile supports installing Python dependencies via `pip` from a 
`requirements.txt` file. Place your `requirements.txt` file in the same 
directory as the Dockerfile. If it is in a different location, it can be 
specified using the `requirements_path` build-argument. Keep in mind the Docker 
context when copying the `requirements.txt` file. Uncomment the two appropriate 
lines in the Dockerfile that copy the `requirements.txt` file to the container, 
and run `pip install` to install the dependencies on the container.
+
+### Building Image for ECS Executor
+
+Detailed instructions on how to use the Docker image, that you have created 
via this readme, with the ECS Executor can be found 
[here](link_to_how_to_guide).
+
+
+## Logging
+
+Airflow tasks executed via this executor run in ECS containers within the 
configured VPC. This means that logs are not directly accessible to the Airflow 
Webserver and when containers are stopped, after task completion, the logs 
would be permanently lost.
+
+Remote logging can be employed when using the ECS executor to persist your 
Airflow Task logs and make them viewable from the Airflow Webserver.
+
+### Configuring Remote Logging
+
+There are many ways to configure remote logging and several supported 
destinations. A general overview of Airflow Task logging can be found 
[here](https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/logging-monitoring/logging-tasks.html).
 Instructions for configuring S3 remote logging can be found 
[here](https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/logging/s3-task-handler.html)
 and Cloudwatch remote logging 
[here](https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/logging/cloud-watch-task-handlers.html).
+Some important things to point out for remote logging in the context of the 
ECS executor:
+
+ - The configuration options for Airflow remote logging must be configured on 
the host running the Airflow Webserver (so that it can fetch logs from the 
remote location) as well as within the ECS container running the Airflow Tasks 
(so that it can upload the logs to the remote location). See 
[here](https://airflow.apache.org/docs/apache-airflow/stable/howto/set-config.html)
 to read more about how to set Airflow configuration via config file or 
environment variable exports.
+ - Adding the Airflow remote logging config to the container can be done in 
many ways. Some examples include, but are not limited to:

Review Comment:
   I'll add, with KubernetesExecutor, it's assumed that the "deployment 
manager" handles this themselves in the pod_template_file. We should use a 
similar approach for our example here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [airflow] jedcunningham commented on a diff in pull request #34381: AWS ECS Executor

Reply via email to