Cool, will bear that in mind. It's really only a handful of small scripts that use certain r packages for various reasons.
Cheers, Andy On Thu, 13 Jul 2017, 17:30 Maxime Beauchemin, <[email protected]> wrote: > Operators as an abstraction for something like R tend to be more > restrictive than useful. Similarly it's hard to write a useful > SparkOperator because it will typically simply fetch an artifact and fire > it up, and people have different ways of storing artifacts so there's not > much to generalize. > > Though I could see that if there are a set of common patterns you use R for > and want to parameterize and abstract out or "industrialize" then specific > operators can be useful. "FetchFromS3andRankROperator" or something like > that makes more sense than a generic ROperator(script) which would be a > very thin wrapper around BashOperator. > > These specific operators are usually specific to your environment and can > be defined and reused within your DAG repository. > > I don't want to start a flame war here but there's a bigger question on > whether you want to allow running R in production. It's dangerous for many > reasons that I won't get into here unless we decide to have this > conversation. Regardless, we do use R in production at Airbnb and would > recommend using the cgroup features in Airflow and having a dedicated queue > of workers to insulate abuse and contain resource utilisation. I'd also > recommend publishing a set of internal rules "When is it ok to use R in > production" and have engineers do some gatekeeping in source control. > > You also may want to consider SparkR as a path to productionize R though > from my experience data scientists tend to find it too restrictive as it > doesn't have the bells, whistles and trumpets the desktop R has. > > Max > > On Thu, Jul 13, 2017 at 7:32 AM, Scott Halgrim <[email protected]> > wrote: > > > This doesn’t really answer your question, but for what it’s worth, > > virtually our entire pipeline is written in R. We use BashOperators to > call > > a templated Rscript call. > > > > On Jul 13, 2017, 6:21 AM -0700, Andrew Maguire <[email protected]>, > > wrote: > > > Hey, > > > > > > I'm sure this has been asked 100's times before. > > > > > > Is there any plans for adding R script operators? > > > > > > Looks around the contrib part of code base but could'nt find anything. > > > > > > Found some tickets in the JIRA but seemed to be from around 2014 and > > maybe > > > for stuff that has since been removed. > > > > > > I'm porting lots of jobs over to airflow and just trying to assess if > > worth > > > redoing them in python, maybe call them with bash operators, or just > > leave > > > them in my cron jobs for now. > > > > > > Would be happy to help out testing or reviewing anything in any way if > > > there are efforts ongoing. > > > > > > Cheers, > > > Andy > > >
