Hello Andrey, I think both myself and Maxime - we asked some important questions. If you want to proceed with the donation, I think it would be great if you let us know what do you think about the issues we mentioned. I know also Michael whom I met at the workshops in Berlin - was very interested in this - so maybe you can take part in the discussion. If you are willing to donate the code and continue the discussion on it , I think we have to start well ... discussing :).
I just copied our point below to make it easier to answer both of us at the same time: Jarek: 1) is the CWL package more of a converter of CWL to Python DAG files (that can then be scheduled as usual) or whether it is running alongside of the scheduler and schedules tasks and operators separately using different scheduling engine?. As a reference there is an https://github.com/GoogleCloudPlatform/oozie-to-airflow converter from Oozie XML to airflow DAGs. I think the biggest advantage of Airflow is being able to modify and iterate quickly using python code so having aPython Dag generated from CWL might be a good idea - even if it is not perfect, user can still modify it and extend later manually rather than relaying that all the features of CWL are implemented. 2 I'd also like to understand what dependencies it introduces on Airflow - whether it relies on certain internals of Airflow that could make Airflow's evolution more difficult? Also we have a roadmap for Airflow 2.0 already and there are certain incompatibilities implemented, more is planned already (and more to come not planned yet). Is the CWL importer 1.10 compatible or both 1.10 and (current state of) 2.0? Have you been following some of the discussions with 2.0 and are you aware of some potential incompatibilities? 3) What are the benefits you see to have Airflow CWL package managed by the Airflow community rather than CWL one? It could work both ways - it could be managed by either of the communities (as usual in case of such imports), but I think it has to be carefully weighted who maintains it eventually - it all depends on how much one could rely on other, what is the release cycle of CWL new versions vs. Airflow versions etc. Could you share your thought process and why you think it should be part of Airflow ? Maxime: 4) Personally I like the idea of an ecosystem of packages (and repos) managed and maintained by their specialist. That way they can have their own CI, their own release processes and cycles, and "namespaced" notifications. If anything I'd rather push in the direction of breaking Airflow into many smaller packages (core, scheduler, web, ...) as opposed to tacking other projects on top of it. 5) Also arguably Airflow's DSL may be more "common" than CWL. Clearly CWL has more focussed intentions around creating something universal, but to me that doesn't necessarily make it more legitimate or common than other specs (Oozie, Azkaban , Informatica, ...) and should be treated similarly (would we want to include extensions to all these as part of Airflow?). 6) I also prefer the codegen/migration approach (I think the `oozie-to-airflow` tool does that) to allow a path that resolves the common denominator lmitations. How can this tooling expose features that are proper to Airflow (pools, priority weights, xcoms, callbacks!, ...)? J. On Thu, Oct 31, 2019 at 1:57 AM Maxime Beauchemin < [email protected]> wrote: > As someone who has spent a lot of time acting as a maintainer, a code > "donation" seems like dangerous gift to accept. > > Personally I like the idea of an ecosystem of packages (and repos) managed > and maintained by their specialist. That way they can have their own CI, > their own release processes and cycles, and "namespaced" notifications. If > anything I'd rather push in the direction of breaking Airflow into many > smaller packages (core, scheduler, web, ...) as opposed to tacking other > projects on top of it. > > Also arguably Airflow's DSL may be more "common" than CWL. Clearly CWL has > more focussed intentions around creating something universal, but to me > that doesn't necessarily make it more legitimate or common than other specs > (Oozie, Azkaban , Informatica, ...) and should be treated similarly (would > we want to include extensions to all these as part of Airflow?). > > I also prefer the codegen/migration approach (I think the > `oozie-to-airflow` tool does that) to allow a path that resolves the common > denominator lmitations. How can this tooling expose features that are > proper to Airflow (pools, priority weights, xcoms, callbacks!, ...)? > > Max > > On Wed, Oct 30, 2019 at 12:32 PM Andrey Kartashov <[email protected]> > wrote: > > > My name is Andrey and I'm developer behind CWL-Airflow. > > This message is follow up slack conversation. I copy past some messages > > from there here. > > > > > > >> Slack chat: > > > > When I've met CWL team there were no pipeline managers to support it. > I've > > picked up Airflow to just prove the concept that it is possible. > > > > The same time I was looking for a pipeline manager to use for > > bioinformatic analysis and asked tons of questions from Airflow team as a > > result special note in documentation: "Beyond the Horizon". > Nevertheless, I > > adopted Airflow for our bioinformatic use > > > > There are more than 200 different pipeline managers, and to believe that > > in nearest future there will the only one and perfect one sounds > > impossible. So, to exchange pipeline logic between different pipeline > > managers and people it is good to have a standard (CWL is a a perfect > fit) > > like JavaScript standard and different executers, browsers... > > > > Apache taverna (pipeline manager) is working on adopting CWL for a while > > now, we have code it is already working. > > > > So yes, CWL-Airflow is developed and the use is simple it extends Airflow > > DAG class. However it is still required to put .py file with DAG (CWLDAG > in > > our case) to the dag directory. I would like just to put .cwl file into > DAG > > directory to simplify the usage > > > > I'm ready to develop what is necessary, but I'm not quite sure (I'm not a > > big expert in airflow code) which way to go, plugin or some native core > > code, or ... > > > > The project by itself lives https://github.com/Barski-lab/cwl-airflow, > > there are tons of CWL tests > > https://ci.commonwl.org/job/airflow-conformance/ > > > -- Jarek Potiuk Polidea <https://www.polidea.com/> | Principal Software Engineer M: +48 660 796 129 <+48660796129> [image: Polidea] <https://www.polidea.com/>
