adityav edited a comment on issue #6340: [Airflow-5660] Try to find the task in DB before regressing to search… URL: https://github.com/apache/airflow/pull/6340#issuecomment-547511993 > Have you run this in a Kube cluster? I have a feeling that _every_ task will hit the bad path because of the characters in the execution date We are running airflow in EKS with this patch applied and it works. We can finally scale to 100,000+ tasks with this. Previously it would choke with 5k-10k tasks. Only dag_id / task_id are being stored there. The execution date isn't being stored in the labels so it shouldn't be a problem. Ideally I would prefer to eliminate the bad path altogether. Currently, it requires the dag writer to write good dag id / task id which isn't a good design. I can only think of 2 solns: 1. Use task_id / dag_id stored in env variables. Values stored in env variables don't have any label specific restrictions. However, I am not familiar with kube api to know how easy it is to do. 2. have a mapping table of (exec_date, dag_id, task_id, safe_dag_id, safe_task_id) in airflow metadb.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
