adityav edited a comment on issue #6340: [Airflow-5660] Try to find the task in 
DB before regressing to search…
URL: https://github.com/apache/airflow/pull/6340#issuecomment-547511993
 
 
   > Have you run this in a Kube cluster? I have a feeling that _every_ task 
will hit the bad path because of the characters in the execution date
   
   We are running airflow in EKS with this patch applied and it works. We can 
finally scale to 100,000+ tasks with this. Previously it would choke with 
5k-10k tasks.
   Only dag_id / task_id are being stored there. The execution date isn't being 
stored in the labels so it shouldn't be a problem. 
   
   Ideally I would prefer to eliminate the bad path altogether. Currently, it 
requires the dag writer to write good dag id / task id which isn't a good 
design. I can only think of 2 solns:
   
   1. Use task_id / dag_id stored in env variables. Values stored in env 
variables don't have any label specific restrictions. However, I am not 
familiar with kube api to know how easy it is to do.
   2. have a mapping table of (exec_date, dag_id, task_id, safe_dag_id, 
safe_task_id) in airflow metadb.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to