shivaam commented on PR #64326: URL: https://github.com/apache/airflow/pull/64326#issuecomment-4146318780
Nice. Seems like a real production bug. A few thoughts: 1. Default of 512 may be too low. The scheduler processes all active DAGs every cycle. With 1000+ DAGs, a 512 cache means constant eviction and re-fetching from the DB on every loop. The API server's Execution API also serves worker requests for every task state transition, so it can accumulate entries fast too. Consider starting higher (2048+) and letting people tune down — it's easier to reduce a known number than to discover you need to increase one you didn't know existed. 2. A single config for both scheduler and API server may not be ideal. The scheduler's working set is bounded (latest version per active DAG) and performance-sensitive — it needs a cache big enough to hold all active DAGs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
