Hi All, We have a use case where we are building Airflow DAG consisting of few tasks and each task (HttpOperator) is calling the service running behind AWS Elastic Load Balancer (ELB).
Since these tasks are the long running process so I'm getting 504 GATEWAY TIMEOUT HTTP status code and resulting into incorrect task status at Airflow side. IMO to solve this problem, we can choose among following approaches - Make a call to the service and service will send back response and process actual request in another thread/process. One monitoring thread would heartbeat about task status to DB. At Airflow side, immediate task after each HttpOperator, we should have a sensor which should check for the status change in given poke interval. - Since we have around 1500 task running per hour so using service discovery system like Apache Zookeeper to get the node in round-robin fashion would make a direct connection with the node running service. - AWS ELB has limitation over HTTP idle-timeout to 1hr and my tasks are taking ~ 3 hr to get it done so no change at AWS ELB possible Both approaches have cons first one, makes us change our current flow at each service side i.e. handle a request in async mode, start heartbeat on executing process/thread status in some interval hence the DB writes. I'm interested to know how you guys are handling this problem and any suggestion or improvement in mentioned approaches I can use. Thanks, Amit
