Hi Max, That's very helpful. Look forward to the version semantics features in Airflow. Until then I will use chef (or alternatives).
Thanks, Vijay On Thu, Sep 8, 2016 at 8:39 AM, Maxime Beauchemin < [email protected]> wrote: > Hi Vijay, > > Up until recently we had the assumption that people had already their own > way of syncing GH repos on their infrastructure. In our case at Airbnb it's > chef, and pretty much every company has their own way of doing this and is > a requirement for distributed Airflow. > > A related item on our roadmap is to allow for adding version semantics (git > SHAs) in the communication layer so that workers would fetch shallow clones > of the DAG repository as of a specific version. We were debating on using > some form of serialization versus this approach and decided to fully > embrace configuration as code, and shy away from the serialization / > artifact management which brings in many challenges and limitations, > especially in Python. > > As we roll this change out, Airflow won't rely on external services to sync > up repos, and we'll have a solid story around versioning. Of course that > implies that Git becomes a critical hotspot in the cluster. We're planning > to ship this feature as opt-in, at least until 2.0 > > To the community, we'll share a formal design doc in the near future, in > the meantime this thread can be a good place for discussing this solution > at a high level. > > Thanks, > > Max > > On Wed, Sep 7, 2016 at 3:25 PM, Vijay Bhat <[email protected]> wrote: > > > Hi all, > > > > First off, I want to thank the Airflow community for developing a > fantastic > > data pipelining platform. I used Dataswarm extensively while I was at > > Facebook, and it's awesome to see most of the functionality available for > > the rest of the world to use in the form of Airflow. > > > > What I haven't found in the documentation is a prescribed way to connect > > the source control repo for the DAG code to the Airflow DAG folder to > make > > sure the latest code changes are picked up by the scheduler. In the > Airflow > > forums, I have seen people mention using cron / chef / puppet etc, but no > > git webhook (https://developer.github.com/v3/repos/hooks/) based > methods. > > > > Using webhooks would prevent the need to poll the repo for changes. For > > example, Jenkins uses webhooks to auto-trigger builds - > > https://wiki.jenkins-ci.org/display/JENKINS/Github+Plugin#GithubPlugin- > > TriggerabuildwhenachangeispushedtoGitHub. > > Does Airflow have a way of configuring something similar? > > > > Thanks! > > Vijay > > >
