Operators as an abstraction for something like R tend to be more
restrictive than useful. Similarly it's hard to write a useful
SparkOperator because it will typically simply fetch an artifact and fire
it up, and people have different ways of storing artifacts so there's not
much to generalize.

Though I could see that if there are a set of common patterns you use R for
and want to parameterize and abstract out or "industrialize" then specific
operators can be useful. "FetchFromS3andRankROperator" or something like
that makes more sense than a generic ROperator(script) which would be a
very thin wrapper around BashOperator.

These specific operators are usually specific to your environment and can
be defined and reused within your DAG repository.

I don't want to start a flame war here but there's a bigger question on
whether you want to allow running R in production. It's dangerous for many
reasons that I won't get into here unless we decide to have this
conversation. Regardless, we do use R in production at Airbnb and would
recommend using the cgroup features in Airflow and having a dedicated queue
of workers to insulate abuse and contain resource utilisation. I'd also
recommend publishing a set of internal rules "When is it ok to use R in
production" and have engineers do some gatekeeping in source control.

You also may want to consider SparkR as a path to productionize R though
from my experience data scientists tend to find it too restrictive as it
doesn't have the bells, whistles and trumpets the desktop R has.

Max

On Thu, Jul 13, 2017 at 7:32 AM, Scott Halgrim <[email protected]>
wrote:

> This doesn’t really answer your question, but for what it’s worth,
> virtually our entire pipeline is written in R. We use BashOperators to call
> a templated Rscript call.
>
> On Jul 13, 2017, 6:21 AM -0700, Andrew Maguire <[email protected]>,
> wrote:
> > Hey,
> >
> > I'm sure this has been asked 100's times before.
> >
> > Is there any plans for adding R script operators?
> >
> > Looks around the contrib part of code base but could'nt find anything.
> >
> > Found some tickets in the JIRA but seemed to be from around 2014 and
> maybe
> > for stuff that has since been removed.
> >
> > I'm porting lots of jobs over to airflow and just trying to assess if
> worth
> > redoing them in python, maybe call them with bash operators, or just
> leave
> > them in my cron jobs for now.
> >
> > Would be happy to help out testing or reviewing anything in any way if
> > there are efforts ongoing.
> >
> > Cheers,
> > Andy
>

Reply via email to