zachliu commented on PR #51692:
URL: https://github.com/apache/airflow/pull/51692#issuecomment-4057511365

   For whoever is in the same boat: I've discovered an undocumented nuance of 
the AWS ECS API — `run_task()` returns containers in the same order as your 
task definition's `containerDefinitions`, but `describe_tasks()` does not. It 
randomly reorders them.
   
   This matters because this PR switched from reading `containers[0]` from the 
`run_task()` response to the `describe_tasks()` response. So if you have 
sidecar containers (e.g. datadog-agent), `containers[0]` will randomly point to 
the sidecar instead of your main app container. This causes the operator to 
construct the wrong CloudWatch log stream name, resulting in:
   
   ```
   botocore.errorfactory.ResourceNotFoundException: An error occurred 
(ResourceNotFoundException) when calling the GetLogEvents operation: The 
specified log stream does not exist.
   ```
   
   I verified this experimentally: across 10 `run_task()` calls, 
`containers[0]` was always the app container (0/10 wrong). Across 10 
`describe_tasks()` calls for the same tasks, `containers[0]` was the sidecar 
4/10 times.
   
   The fix is to always pass `container_name` explicitly to 
`EcsRunTaskOperator` (available since 9.3.0).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to