Operators as an abstraction for something like R tend to be more restrictive than useful. Similarly it's hard to write a useful SparkOperator because it will typically simply fetch an artifact and fire it up, and people have different ways of storing artifacts so there's not much to generalize.
Though I could see that if there are a set of common patterns you use R for and want to parameterize and abstract out or "industrialize" then specific operators can be useful. "FetchFromS3andRankROperator" or something like that makes more sense than a generic ROperator(script) which would be a very thin wrapper around BashOperator. These specific operators are usually specific to your environment and can be defined and reused within your DAG repository. I don't want to start a flame war here but there's a bigger question on whether you want to allow running R in production. It's dangerous for many reasons that I won't get into here unless we decide to have this conversation. Regardless, we do use R in production at Airbnb and would recommend using the cgroup features in Airflow and having a dedicated queue of workers to insulate abuse and contain resource utilisation. I'd also recommend publishing a set of internal rules "When is it ok to use R in production" and have engineers do some gatekeeping in source control. You also may want to consider SparkR as a path to productionize R though from my experience data scientists tend to find it too restrictive as it doesn't have the bells, whistles and trumpets the desktop R has. Max On Thu, Jul 13, 2017 at 7:32 AM, Scott Halgrim <[email protected]> wrote: > This doesn’t really answer your question, but for what it’s worth, > virtually our entire pipeline is written in R. We use BashOperators to call > a templated Rscript call. > > On Jul 13, 2017, 6:21 AM -0700, Andrew Maguire <[email protected]>, > wrote: > > Hey, > > > > I'm sure this has been asked 100's times before. > > > > Is there any plans for adding R script operators? > > > > Looks around the contrib part of code base but could'nt find anything. > > > > Found some tickets in the JIRA but seemed to be from around 2014 and > maybe > > for stuff that has since been removed. > > > > I'm porting lots of jobs over to airflow and just trying to assess if > worth > > redoing them in python, maybe call them with bash operators, or just > leave > > them in my cron jobs for now. > > > > Would be happy to help out testing or reviewing anything in any way if > > there are efforts ongoing. > > > > Cheers, > > Andy >
