Hi Max,

That's very helpful. Look forward to the version semantics features in
Airflow. Until then I will use chef (or alternatives).

Thanks,
Vijay

On Thu, Sep 8, 2016 at 8:39 AM, Maxime Beauchemin <
[email protected]> wrote:

> Hi Vijay,
>
> Up until recently we had the assumption that people had already their own
> way of syncing GH repos on their infrastructure. In our case at Airbnb it's
> chef, and pretty much every company has their own way of doing this and is
> a requirement for distributed Airflow.
>
> A related item on our roadmap is to allow for adding version semantics (git
> SHAs) in the communication layer so that workers would fetch shallow clones
> of the DAG repository as of a specific version. We were debating on using
> some form of serialization versus this approach and decided to fully
> embrace configuration as code, and shy away from the serialization /
> artifact management which brings in many challenges and limitations,
> especially in Python.
>
> As we roll this change out, Airflow won't rely on external services to sync
> up repos, and we'll have a solid story around versioning. Of course that
> implies that Git becomes a critical hotspot in the cluster. We're planning
> to ship this feature as opt-in, at least until 2.0
>
> To the community, we'll share a formal design doc in the near future, in
> the meantime this thread can be a good place for discussing this solution
> at a high level.
>
> Thanks,
>
> Max
>
> On Wed, Sep 7, 2016 at 3:25 PM, Vijay Bhat <[email protected]> wrote:
>
> > Hi all,
> >
> > First off, I want to thank the Airflow community for developing a
> fantastic
> > data pipelining platform. I used Dataswarm extensively while I was at
> > Facebook, and it's awesome to see most of the functionality available for
> > the rest of the world to use in the form of Airflow.
> >
> > What I haven't found in the documentation is a prescribed way to connect
> > the source control repo for the DAG code to the Airflow DAG folder to
> make
> > sure the latest code changes are picked up by the scheduler. In the
> Airflow
> > forums, I have seen people mention using cron / chef / puppet etc, but no
> > git webhook (https://developer.github.com/v3/repos/hooks/) based
> methods.
> >
> > Using webhooks would prevent the need to poll the repo for changes. For
> > example, Jenkins uses webhooks to auto-trigger builds -
> > https://wiki.jenkins-ci.org/display/JENKINS/Github+Plugin#GithubPlugin-
> > TriggerabuildwhenachangeispushedtoGitHub.
> > Does Airflow have a way of configuring something similar?
> >
> > Thanks!
> > Vijay
> >
>

Reply via email to