At Airbnb using the Celery executor we use queues to wire tasks to machines provisioned in specific ways and we use the cgroup feature to constrain resource utilization as we fire up tasks. That requires running the worker service as root as its a requirement to impersonate and use cgroups.
In the context of Mesos things may be different as you may want to do that on a different layer. I'd read through the MesosExecutor to see if it does any of this already, or to figure out where you may be able to hook things up. Note that (from memory) the MesosExecutor relies on pickling to get serialized DAGs [through the database] to Mesos slots, and that chances are high that we may deprecate that feature in the future. By that time we'll probably have a "DagFetcher" abstraction, allowing to get the DAG definition in another way on the fly. Max On Thu, Aug 3, 2017 at 10:24 AM, Victor Monteiro <[email protected]> wrote: > Hi Stefano, have you read about queues? Airflow has this concept and I > think you can decide for which queue a task should go. By doing this and > integrating it with mesos, I believe you can make a mesos cluster with more > resources to get tasks from a certain queue specific for heavy > computations. > > Maybe this can solve your problem (not sure) :D > > 2017-08-03 4:34 GMT-03:00 Stefano Baghino <[email protected]>: > > > Hi everyone, > > > > I'm investigating the possibility for our organization to use Airflow for > > workflow management. > > > > Some requirements on our side regard resource management, and in > particular > > the possibility for the system to run tasks on top of Apache Mesos. > Airflow > > partially satisfies our requirements in that regard, meaning that after > > having a look at the docs and code, it appears to me (correct me if I'm > > wrong) that resources are determined for the whole system (via > > configuration) and cannot be determined on a per-task basis. We'd need > this > > because some of our jobs are quite lightweight while others may require a > > lot of resources, making it a "one-size-fits-all" configuration quite > > wasteful. > > > > I had a look at the AirflowMesosScheduler and MesosExecutor and thought > it > > would be nice to add this feature and perhaps I can add it myself. What I > > would need is some guidance on how to make this fit into the overall > system > > design: is there an established way to explicitly ask for resources for a > > specific task in the DAG? If not, what could be a possible way to > introduce > > it? And if this reveals itself to be outside of the scope of Airflow, how > > do you think I can make it meet our requirement? > > > > Thanks in advance. > > > > P.S.: if by any chance some of you are on the Mesos mailing list as well, > > you may know that I'm having issues in making Airflow run successfully > > using Mesos due to missing Python packages. I'm not sure whether this > > mailing list is an appropriate place for users to get help. If so, I > could > > probably share that post here as well. Thanks! > > > > -- > > Stefano Baghino | TERALYTICS > > *software engineer* > > > > Teralytics AG | Zollstrasse 62 | 8005 Zurich | Switzerland > > phone: +41 43 508 24 57 > > email: [email protected] > > www.teralytics.net > > > > Company registration number: CH-020.3.037.709-7 | Trade register Canton > > Zurich > > Board of directors: Georg Polzer, Luciano Franceschina, Mark Schmitz, > Yann > > de Vries > > > > This e-mail message contains confidential information which is for the > sole > > attention and use of the intended recipient. Please notify us at once if > > you think that it may not be intended for you and delete it immediately. > > >
