Hi Vijay, Up until recently we had the assumption that people had already their own way of syncing GH repos on their infrastructure. In our case at Airbnb it's chef, and pretty much every company has their own way of doing this and is a requirement for distributed Airflow.
A related item on our roadmap is to allow for adding version semantics (git SHAs) in the communication layer so that workers would fetch shallow clones of the DAG repository as of a specific version. We were debating on using some form of serialization versus this approach and decided to fully embrace configuration as code, and shy away from the serialization / artifact management which brings in many challenges and limitations, especially in Python. As we roll this change out, Airflow won't rely on external services to sync up repos, and we'll have a solid story around versioning. Of course that implies that Git becomes a critical hotspot in the cluster. We're planning to ship this feature as opt-in, at least until 2.0 To the community, we'll share a formal design doc in the near future, in the meantime this thread can be a good place for discussing this solution at a high level. Thanks, Max On Wed, Sep 7, 2016 at 3:25 PM, Vijay Bhat <[email protected]> wrote: > Hi all, > > First off, I want to thank the Airflow community for developing a fantastic > data pipelining platform. I used Dataswarm extensively while I was at > Facebook, and it's awesome to see most of the functionality available for > the rest of the world to use in the form of Airflow. > > What I haven't found in the documentation is a prescribed way to connect > the source control repo for the DAG code to the Airflow DAG folder to make > sure the latest code changes are picked up by the scheduler. In the Airflow > forums, I have seen people mention using cron / chef / puppet etc, but no > git webhook (https://developer.github.com/v3/repos/hooks/) based methods. > > Using webhooks would prevent the need to poll the repo for changes. For > example, Jenkins uses webhooks to auto-trigger builds - > https://wiki.jenkins-ci.org/display/JENKINS/Github+Plugin#GithubPlugin- > TriggerabuildwhenachangeispushedtoGitHub. > Does Airflow have a way of configuring something similar? > > Thanks! > Vijay >
