---------- Forwarded message --------- From: Max Goodridge <w...@maxgoodridge.com> Date: Fri, Oct 26, 2018 at 9:41 PM Subject: Class-based DAG Syntactic Sugar To: <dev-subscr...@airflow.incubator.apache.org>
Hello team, I would like to make a proposal that some people have said would make sense to merge upstream. If you like Django, it'll feel very familiar to you. The following is copied from Slack - original here: https://apache-airflow.slack.com/archives/CCY359SCV/p1538585182000100 We use Airflow to write a lot of DAGs, and coming from a Django background I found it frustrating that I had to repeat myself and write DAGs in a way that in my opinion could be more Pythonic. Specifically for example specifying dependencies (90% of DAG operators have the same dependency chain) and having to manually create a DAG instead of just defining a class that does it for us. We currently use an abstraction layer on top of Airflow. Its called “workflows” - essentially class-based DAGs. Let me illustrate with an example of what our DAGs look like: ``` # This would create a DAG called `example_workflow` with two operators, with the second dependant on the first and explicit DAG metadata (a schedule) in this case. class ExampleWorkflow(workflows.Workflow): class Meta: schedule_interval = '0 9 * * *' do_something_useful = workflows.PythonOperator( python_callable=python_callable, ) something_else = workflows.PythonOperator( python_callable=python_callable, ) ``` We also currently have an extra line to work around Airflow’s use of globals for DAG collection but that would disappear nicely if we choose to merge this abstraction upstream. I thought about doing some hacky things or maintaining a custom fork but it was decided against for now. Key points: • Class attributes are the default `task_id` for associated operators, otherwise the operators are the same (though `task_id` can be specified as normal for easy backwards compatibility, and easier migration of old DAGs to new syntax). • The Django-inspired `Meta` class sets DAG information, including any arg/kwarg that you’d normally specify directly in the `DAG` class construction (could be anything else too though). • *Default* (inherited) dependency structure that can be overridden by overriding the relevant class method (our signature: `def dependencies(cls, operators):`) • Its *class-based* - that importantly means we can inheritance to eliminate repeated operators and metadata (e.g. post to Slack, Datadog, etc…) • Using DAG metadata that could be inherited (assuming it exists somewhere in the MRO) we can write similar DAGs very simply, according to the inherited schedule for that particular domains schedule interval for example. Any questions welcome. I would be happy to make the necessary changes. --- End of Slack Message --- Thank you to Kaxil Naik for the feedback so far and the advice to post here to gauge interest of this abstraction. Thanks, Max