We use Docker at Edmodo and it really helped for Airflow.

It's easy to say "pip install airflow" itself, but some of the database
drivers require pip installs that then require dev versions of host .rpm or
.deb packages because they want a .h file to compile against.

We are porting a large complex Hadoop-based ETL to Airflow and used Docker
to package web services that we call from Airflow.

Another part of our system is that we want to set up Amazon "AutoStart
Groups" to launch more Airflow executor servers when our main server
becomes overloaded. We run a few large-memory Java jobs and this will be a
problem soon. Our tooling lets us easily set this up with Docker. (We wrote
something just like Docker Compose that talks to ASG. It's incredibly
useful.)

So, yeah, "pip install airflow" is fine for kicking the tires but we needed
binary management rather quickly after that.

Cheers,

Lance

On Wed, May 4, 2016 at 1:28 PM, Chris Riccomini <[email protected]>
wrote:

> > As far as ease of use, while docker is definitely getting more popular,
> it
> is hard to beat the current pip install flow for people not quite up to
> date
> on how to setup docker. It seems like one more hurdle if you just want to
> get started.
>
> Strongly agree. We tried to use Vagrant and then Docker with a prior
> project, and it was a pain. Another project that I'm working with now uses
> Docker for its hello-world stuff, and it's really troublesome. You will get
> WAY more questions if you go this route than the current simple pip/sqlite
> route.
>
> On Wed, May 4, 2016 at 12:27 PM, Maxime Beauchemin <
> [email protected]> wrote:
>
> > Yeah I'd be curious to see how the Docker setup instructions (from
> scratch)
> > would compare to the current ones.
> >
> > On Wed, May 4, 2016 at 11:05 AM, Arthur Wiedmer <
> [email protected]>
> > wrote:
> >
> > > +1, but it feels like just piling on.
> > >
> > > One thing we could consider is which part we would like to fix.
> > >
> > > - If it is the seriousness/production ready db, but that is still a
> local
> > > db/client, we could try something like firebird.
> > > Relatively small footprint and can do multithreading, it is supported
> by
> > > SQLAlchemy, though it is not as easy to install as sqlite on most
> *nixes.
> > > We could spend some cycles baking this into containers as well.
> > >
> > > - As far as ease of use, while docker is definitely getting more
> popular,
> > > it is hard to beat the current pip install flow for people not quite up
> > to
> > > date on how to setup docker. It seems like one more hurdle if you just
> > want
> > > to get started.
> > >
> > > Best,
> > > Arthur
> > >
> > >
> > > On Wed, May 4, 2016 at 9:35 AM, Maxime Beauchemin <
> > > [email protected]> wrote:
> > >
> > > > Making it frictionless for people to get their feet wet is extremely
> > > > important. It's been a requirement since the early prototypes and I
> > feel
> > > > strongly about keeping it that way. It's hard to test this
> hypothesis,
> > > but
> > > > it could be a defining factor in the success of this project (to-date
> > and
> > > > future).
> > > >
> > > > Docker may allow for more batteries to be included and offer even
> less
> > > > friction than the `pip install` path for folks who are familiar with
> > it.
> > > > I'd have to look to see if the community contributed Docker images
> are
> > up
> > > > to date. We may want to make that "the way to go" and change the
> > > tutorial /
> > > > quick start instructions to reflect that if it makes sense. That may
> > > > require integrating the burning of images as part of the build and/or
> > > > release process.
> > > >
> > > > Max
> > > >
> > > > On Wed, May 4, 2016 at 6:33 AM, Jeremiah Lowin <[email protected]>
> > > wrote:
> > > >
> > > > > +1, shipping Airflow "batteries included" is very important in my
> > > > opinion.
> > > > > There is a lot to grok and the easiest way to learn is by letting
> > folks
> > > > > spin up a working installation right away. Unfortunately I don't
> > think
> > > > > there's a viable alternative to SQLite that is also supported by
> > > > > SQLAlchemy.
> > > > >
> > > > > On Wed, May 4, 2016 at 2:57 AM Prateek Rungta <[email protected]>
> > > > wrote:
> > > > >
> > > > > > It's documented pretty well that it's only for people to get
> their
> > > feet
> > > > > wet
> > > > > > with. From the quickstart
> > > > > > <http://pythonhosted.org/airflow/start.html?highlight=sqlite>:
> > > > > >
> > > > > > Out of the box, Airflow uses a sqlite database, which you should
> > > > outgrow
> > > > > > fairly quickly since no parallelization is possible using this
> > > database
> > > > > > backend. It works in conjunction with the SequentialExecutor
> which
> > > will
> > > > > > only run task instances sequentially. While this is very
> limiting,
> > it
> > > > > > allows you to get up and running quickly and take a tour of the
> UI
> > > and
> > > > > the
> > > > > > command line utilities.
> > > > > >
> > > > > > FWIW, I'm now on day 2 of using Airflow. And while I wouldn't
> dream
> > > of
> > > > > > deploying Airflow using SQLite beyond my laptop, I quite
> > appreciated
> > > > > being
> > > > > > able to mess with Airflow without any of the infrastructural
> > > > constraints.
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Tue, May 3, 2016 at 11:18 PM, Siddharth Anand <
> > [email protected]>
> > > > > > wrote:
> > > > > >
> > > > > > > From time to time, we run into bugs with the SQLite dialect in
> > > > > SQLAlchemy
> > > > > > > and close the bugs as "wont-fix" because we don't want to be in
> > the
> > > > > > > business of fixing such bug. We deem SQLite as a "non-serious"
> > > > database
> > > > > > > that no one [in his/her right mind] would run in his/her
> staging,
> > > qa,
> > > > > or
> > > > > > > production environments. However, we rely on the
> > SequentialExecutor
> > > > and
> > > > > > one
> > > > > > > the SQLite DB for our tests.
> > > > > > > What should we do with SQLite? Should we lift up the hood and
> fix
> > > it
> > > > > for
> > > > > > > our needs or find either a different ORM or a different option
> > for
> > > DB
> > > > > > > backend?
> > > > > > > Example of bugs we encounter and close as won't fix : 1.
> > Deleting a
> > > > > task
> > > > > > > instance : https://github.com/airbnb/airflow/issues/9552.
> Weird
> > > > pickle
> > > > > > > issue : https://issues.apache.org/jira/browse/AIRFLOW-46
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>



-- 
Lance Norskog
[email protected]
Redwood City, CA

Reply via email to