Hello, I have a DAG (externally triggered) where some processing is done at an external system (EC2 instance). The processing is started by an Airflow task (via HTTP request). The DAG should only continue once that processing is completed. In a first naive implementation I created a sensor that gets the progress (via HTTP request) and only if status is "finished" returns true and the DAG run continues. That works but...
... the external processing can take hours or days, and during that time a worker is occupied which does nothing but HTTP GET and sleep. There will be hundreds of DAG runs in parallel which means hundreds of workers are occupied. I looked into other operators that do computation on external systems (ECSOperator, AWSBatchOperator) but they also follow that pattern and just wait/sleep. So I want to ask if there is a more efficient way to build such a workflow with Airflow? Kind Regards, Stefan