zachliu commented on PR #51692: URL: https://github.com/apache/airflow/pull/51692#issuecomment-4057511365
For whoever is in the same boat: I've discovered an undocumented nuance of the AWS ECS API — `run_task()` returns containers in the same order as your task definition's `containerDefinitions`, but `describe_tasks()` does not. It randomly reorders them. This matters because this PR switched from reading `containers[0]` from the `run_task()` response to the `describe_tasks()` response. So if you have sidecar containers (e.g. datadog-agent), `containers[0]` will randomly point to the sidecar instead of your main app container. This causes the operator to construct the wrong CloudWatch log stream name, resulting in: ``` botocore.errorfactory.ResourceNotFoundException: An error occurred (ResourceNotFoundException) when calling the GetLogEvents operation: The specified log stream does not exist. ``` I verified this experimentally: across 10 `run_task()` calls, `containers[0]` was always the app container (0/10 wrong). Across 10 `describe_tasks()` calls for the same tasks, `containers[0]` was the sidecar 4/10 times. The fix is to always pass `container_name` explicitly to `EcsRunTaskOperator` (available since 9.3.0). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
