jreslock commented on issue #43717:
URL: https://github.com/apache/airflow/issues/43717#issuecomment-2714829442

   > I've tried to reproduce the issue today, running ECS tasks that log 5, 50, 
500, and 5000 lines of logging. Trying to run each one multiple times, and 
every time I got every log line reliably. So there is some piece missing either 
in my repro or the setup you all have running. This actually isn't too 
surprising, since if it was very easy to reproduce, we'd have many people 
cutting tickets (ECS is one of the most used AWS operators by far).
   > 
   > Is there anything in your infra setup that would cause CloudWatch to take 
longer to become consistent? Are you logging across regions? Or connecting 
through VCPE or some other gateway that would slow down network traffic out of 
your VPC? Any more specific you can provide will be very helpful to try 
reproduce something!
   
   I work with @yaningz and can speak a bit about our network configuration.
   
   TL;DR this is a single VPC with 3 private subnets, a VPC endpoint and 2 
security groups.
   
   -  Airflow (MWAA) running in a single VPC attached to two private subnets 
and a single security group. 
   -  ECS tasks launched by the ECSRunTaskOperator are co-located on these same 
2 private subnets plus one additional. Each of our VPCs always has 3 private 
subnets for tasks for AZ redundancy. 
    - Each of the private subnets routes through a NAT Gateway for internet 
egress
   - There is a VPC Endpoint configured for CloudWatch access. This endpoint is 
associated with all 3 private subnets and has 2 security groups associated, one 
allowing HTTP/443 ingress from the IPv4 VPC CIDR and the second allowing 
ingress from itself (any other entity associated with this SG will be allowed)
   
   There is no transit gateway, VPC lattice, cross-account or VPC traversal in 
play and I will add that all other CloudWatch logging functionality appears to 
be working as expected.
   
   Could these missing log entries be caused by some unhandled exception in the 
interaction between `task_log_fetcher`, `waiter`, and `waiter_with_logging`? I 
tried to walk myself through this code last night but I quickly lost my way.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to