On the topic of pointing the code view to yaml, would we alternatively consider adding a view on the UI that would allow arbitrary text content? This could be accomplished by adding an optional parameter to the dag object that allowed you to pass text (or a filepath) that would then go through a renderer (e.g. markdown). It could be a readme, or yaml content or anything the author wanted.
Collin On Fri, Aug 20, 2021 at 3:27 PM Shaw, Damian P. < [email protected]> wrote: > FYI this is what I did on one of my past projects for Airflow. > > > > The users wanted to write their DAGs as YAML files so my “DAG file” was a > Python script that read the YAML files and converted them to DAGs. It was > very easy to do and worked because of the flexibility of Airflow. > > > > The one thing that would have been nice though is if I could of easily > changed the “code view” in Airflow to point to the relevant YAML file > instead of the less useful “DAG file”. > > > > Damian > > > > *From:* Jarek Potiuk <[email protected]> > *Sent:* Friday, August 20, 2021 16:21 > *To:* [email protected] > *Cc:* [email protected] > *Subject:* Re: [DISCUSS] Adding better support for parametrized DAGs and > dynamic DAGs using JSON/YAML dataformats > > > > Airflow DAGS are Python code.This is a very basic assumption - which is > not likely to change. Ever. > > > > And we are working on making it even more powerful. Writing DAGs in > yaml/json makes them less powerful and less flexible. This is fine if you > want to build on top of airflow and build a more declarative way of > defining dags and use airflow to run it under the hood. > > if you think there is a group of users who can benefit from that - cool. > You can publish a code to convert those to Airflow DAGs and submit it to > our Ecosystem page. There are plenty of tlike "CWL - Common Workflow > Language" and others: > > https://airflow.apache.org/ecosystem/#tools-integrating-with-airflow > > > > J. > > > > On Fri, Aug 20, 2021 at 2:48 PM Siddharth VP <[email protected]> > wrote: > > Have we considered allowing dags in json/yaml formats before? I came up > with a rather straightforward way to address parametrized and dynamic DAGs > in Airflow, which I think makes dynamic dags work at scale. > > > > *Background / Current limitations:* > > 1. Dynamic DAG generation using single-file methods > <https://www.astronomer.io/guides/dynamically-generating-dags#single-file-methods> > can > cause scalability issues > <https://www.astronomer.io/guides/dynamically-generating-dags#scalability> > where there are too many active DAGs per file. The > dag_file_processor_timeout is applied to the loader file, so *all* dynamically > generated dags need to be processed in that time. Sure the timeout could be > increased, but that may be undesirable (what if there are other static DAGs > in the system on which we really want to enforce a small timeout?) > > 2. Parametrizing DAGs in Airflow is difficult. There is no good way to > have multiple workflows that differ only by choices of some constants. > Using TriggerDagRunOperator to trigger a generic DAG with conf doesn't give > a native-ish experience as it creates DagRuns of the * triggered* dag > rather than *this* dag - which also means a single scheduler log file. > > > > *Suggested approach:* > > 1. User writes configuration files in JSON/YAML format. The schema can be > arbitrary except for one condition that it must have a *builder* parameter > with the path to a python file. > > 2. User writes the "builder" - a python file containing a make_dag method > that receives the parsed json/yaml and returns a DAG object. (Just a > sample strategy, we could instead say the file should contain a class that > extends an abstract DagBuilder class.) > > 2. Airflow reads JSON/YAML files as well from the dags directory. It > parses the file, imports the builder python file, and passes the parsed > json/yaml to it and collects the generated DAG into the DagBag. > > > > *Sample implementation:* > > See > https://github.com/siddharthvp/airflow/commit/47bad51fc4999737e9a300b134c04bbdbd04c88a; > only major code change is in dagbag.py > > > > *Result:* > > Dag file processor logs show yaml/json file (instead of the builder python > file). Each dynamically generated dag gets its own scheduler log file. > > The configs dag_dir_list_interval, min_file_process_interval, > file_parsing_sort_mode all directly apply to dag config files. > > If the json/yaml fail to parse, it's registered as an import error. > > > > Would like to know your thoughts on this. Thanks! > > Siddharth VP > > > > > -- > > +48 660 796 129 > > > ============================================================================== > Please access the attached hyperlink for an important electronic > communications disclaimer: > http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html > > ============================================================================== >
