Max made some great points on my dataflow PR and I wanted to continue the conversation here to make sure the conversation was visible to all.
While I think my dataflow implementation contains the basic requirements for any more complicated extension (but that conversation can wait!), I had to implement it by adding some very specific "dataflow-only" code to core Operator logic. In retrospect, that makes me pause (as, I believe, it did for Max). After thinking for a few days, what I really want to do is propose a very small change to core Airflow: change BaseOperator.post_execute(context) to BaseOperator.post_execute(result, context). I think the pre_execute and post_execute hooks have generally been an afterthought, but with that change (which, I think, is reasonable in and of itself) I could implement entirely through those hooks. So that brings me to my next point: if the hook is changed, I could happily drop a reworked dataflow implementation into contrib, rather than core. That would alleviate some of the pressure for Airflow to officially decide whether it's the right implementation or not (it is! :) ). I feel like that would be the optimal situation at the moment. And that brings me to my next point: the future of "contrib" and the Airflow community. Having contrib in the core Airflow repo has some advantages: - standardized access - centralized repository for PRs - at least a style review (if not unit tests) from the committers But some big disadvantages as well: - Very complicated dependency management [presumably, most contrib operators need to add an extras_require entry for their specific dependencies] - No sense of ownership or even an easy way to raise issues (due to friction of opening JIRA tickets vs github issues) One thought is to move the contrib directory to its own repo which would keep the advantages but remove the disadvantages from core Airflow. Another is to encourage individual airflow repos (Airflow-Docker, Airflow-Dataflow, Airflow-YourExtensionHere) which could be installed a la carte. That would leave maintenance up to the original author, but could lead to some fracturing in the community as discovery becomes difficult.
