Hi,

I'm interested in integrating some traditional batch systems with Airflow so I 
can run against any available batch resources. My use case is that I'd like to 
run a single airflow instance as a multi-tenant service which can dispatch to 
heterogeneous batch systems across the physical globe. A system I maintain does 
this, and I know HTCondor+DAGMan can do this by treating the batch systems as 
"grid resources". I'm trying to understand if this makes sense to even try with 
Airflow, so I have a few questions.

1. Has anyone looked into or tried this before? I've searched for several hours 
and was unable to find much on this

2. I have a rough idea how AirFlow works but I haven't dug deep into the code. 
If I was to implement something like this, should this be done as an operator 
(i.e. extend BashOperator?) or executor (Mesos Executor) or maybe both? 

3. I've done this thing in the past, and typically you end up with a 
daemon/microservice running for each batch system. That microservice may be 
local to the batch system (works best in the case of LSF/torque/etc), or it may 
be local to the workflow engine but using some sort of exported remote API 
(e.g. grid-connected resources, often using globus APIs and x509 certs), or 
there may be another layer of abstraction involved (in the case of DIRAC). Then 
you have a wrapper/pilot script which will trap a few signals and communicate 
back to the microservice or ot message queue (usually through HTTP or email 
because some batch systems are behind restrictive firewalls) when a job 
actually starts or finishes.

Thanks,
Brian

Reply via email to