vara-bonthu commented on code in PR #25931: URL: https://github.com/apache/airflow/pull/25931#discussion_r956719047
########## docs/apache-airflow-providers-amazon/logging/s3-task-handler.rst: ########## @@ -47,3 +47,86 @@ You can also use `LocalStack <https://localstack.cloud/>`_ to emulate Amazon S3 To configure it, you must additionally set the endpoint url to point to your local stack. You can do this via the Connection Extra ``host`` field. For example, ``{"host": "http://localstack:4572"}`` + +Enabling remote logging for Amazon S3 with AWS IRSA +''''''''''''''''''''''''''''''''''''''''''''''''''' +`IRSA <https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html>`_ is a feature that allows you to assign an IAM role to a Kubernetes service account. +It works by leveraging a `Kubernetes <https://kubernetes.io/>`_ feature known as `Service Account <https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/>`_ Token Volume Projection. +When Pods are configured with a Service Account that references an `IAM Role <https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles.html>`_, the Kubernetes API server will call the public OIDC discovery endpoint for the cluster on startup.When an AWS API is invoked, the AWS SDKs calls ``sts:AssumeRoleWithWebIdentity``. IAM exchanges the Kubernetes issued token for a temporary AWS role credential after validating the token's signature. + +It's recommended best practise to use IAM Role for ServiceAccounts to access AWS services(e.g., S3) from Amazon EKS. +The steps below guides you to create a new IAM role with ServiceAccount and use with Airflow WebServers and Workers (Kubernetes Executors) Pods. + +Step1: Create IAM role for service account (IRSA) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +This step is creating IAM role and service account using `eksctl <https://eksctl.io/>`_. +Also, note that this example is using managed policy with full S3 permissions attached to the IAM role. This is only used for testing purpose. +We highly recommend you to create a restricted S3 IAM policy and use it with ``--attach-policy-arn`` + +Alternatively, you can use other IaC tools like Terraform. For deploying Airflow with Terraform including IRSA. Checkout this example `link <https://github.com/aws-ia/terraform-aws-eks-blueprints/tree/main/examples/analytics/airflow-on-eks>`_. + +Execute the following command by providing all the necessary inputs. + +.. code-block:: bash + + eksctl create iamserviceaccount --cluster="<EKS_CLUSTER_ID>" --name="<SERVICE_ACCOUNT_NAME>" --namespace="<NAMESPACE>" --attach-policy-arn="<IAM_POLICY_ARN>" --approve`` + +Example with sample inputs + +.. code-block:: bash + + eksctl create iamserviceaccount --cluster=airflow-eks-cluster --name=airflow-sa --namespace=airflow --attach-policy-arn=arn:aws:iam::aws:policy/AmazonS3FullAccess --approve + + +Step2: Update Helm Chart `values.yaml` with Service Account +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +This step is using `Airflow Helm Chart <https://github.com/apache/airflow/tree/main/chart>`_ deployment. +If you are deploying Airflow using Helm Chart then you can modify the ``values.yaml`` as mentioned below. +Add the Service Account (e.g., ``airflow-sa``) created by Step1 to Helm Chart ``values.yaml`` under the following sections. +We are using the existing ``serviceAccount`` hence ``create: false`` with existing name as ``name: airflow-sa``. + + +.. code-block:: yaml + + workers: + serviceAccount: + create: false + name: airflow-sa + # Annotations are automatically added by **Step1** to serviceAccount. So, you dont need to mention the annotations. We have added this for information purpose + annotations: + eks.amazonaws.com/role-arn: <ENTER_IAM_ROLE_ARN_CREATED_BY_EKSCTL_COMMAND> + + webserver: + serviceAccount: + create: false + name: airflow-sa + # Annotations are automatically added by **Step1** to serviceAccount. So, you dont need to mention the annotations. We have added this for information purpose + annotations: + eks.amazonaws.com/role-arn: <ENTER_IAM_ROLE_ARN_CREATED_BY_EKSCTL_COMMAND + + config: + logging: + remote_logging: 'True' + logging_level: 'INFO' + remote_base_log_folder: 's3://<ENTER_YOUR_BUCKET_NAME>/<FOLDER_PATH' # Specify the S3 bucket used for logging + remote_log_conn_id: 'aws_s3_conn' # Notice that this name is used in Step3 for creating connections through Airflow UI + delete_worker_pods: 'False' + encrypt_s3_logs: 'True' + +Step3: Create Amazon S3 connection in Airflow Web UI +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +With the above configurations, Webserver and Worker Pods can access Amazon S3 bucket and write logs without using any Access Key and Secret Key or Instance profile credentials. + +The final step to create connections under Airflow UI before executing the DAGs + +* Login to Airflow Web UI with ``admin`` credentials and Navigate to ``Admin -> Connections`` +* Create connection for ``S3`` and select the options(Connection ID and Connection Type) as shown in the image. Review Comment: Thanks! It does make sense to use AWS base connection. I have updated the instructions and the image to reflect the Amazon Web Services base connection. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
