jreslock commented on issue #43717:
URL: https://github.com/apache/airflow/issues/43717#issuecomment-2714829442
> I've tried to reproduce the issue today, running ECS tasks that log 5, 50,
500, and 5000 lines of logging. Trying to run each one multiple times, and
every time I got every log line reliably. So there is some piece missing either
in my repro or the setup you all have running. This actually isn't too
surprising, since if it was very easy to reproduce, we'd have many people
cutting tickets (ECS is one of the most used AWS operators by far).
>
> Is there anything in your infra setup that would cause CloudWatch to take
longer to become consistent? Are you logging across regions? Or connecting
through VCPE or some other gateway that would slow down network traffic out of
your VPC? Any more specific you can provide will be very helpful to try
reproduce something!
I work with @yaningz and can speak a bit about our network configuration.
TL;DR this is a single VPC with 3 private subnets, a VPC endpoint and 2
security groups.
- Airflow (MWAA) running in a single VPC attached to two private subnets
and a single security group.
- ECS tasks launched by the ECSRunTaskOperator are co-located on these same
2 private subnets plus one additional. Each of our VPCs always has 3 private
subnets for tasks for AZ redundancy.
- Each of the private subnets routes through a NAT Gateway for internet
egress
- There is a VPC Endpoint configured for CloudWatch access. This endpoint is
associated with all 3 private subnets and has 2 security groups associated, one
allowing HTTP/443 ingress from the IPv4 VPC CIDR and the second allowing
ingress from itself (any other entity associated with this SG will be allowed)
There is no transit gateway, VPC lattice, cross-account or VPC traversal in
play and I will add that all other CloudWatch logging functionality appears to
be working as expected.
Could these missing log entries be caused by some unhandled exception in the
interaction between `task_log_fetcher`, `waiter`, and `waiter_with_logging`? I
tried to walk myself through this code last night but I quickly lost my way.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]