dlamblin commented on a change in pull request #3115: [AIRFLOW-2193] Add 
ROperator for using R
URL: https://github.com/apache/airflow/pull/3115#discussion_r257906759
 
 

 ##########
 File path: airflow/contrib/operators/r_operator.py
 ##########
 @@ -0,0 +1,79 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from builtins import bytes
+import os
+from tempfile import NamedTemporaryFile
+
+from airflow.models import BaseOperator
+from airflow.utils.decorators import apply_defaults
+from airflow.utils.file import TemporaryDirectory
+
+import rpy2.robjects as robjects
+from rpy2.rinterface import RRuntimeError
+
+
+class ROperator(BaseOperator):
+    """
+    Execute an R script or command
+
+    :param r_command: The command or a reference to an R script (must have
+        '.r' extension) to be executed (templated)
+    :type r_command: string
+    :param output_encoding: encoding output from R (default: 'utf-8')
+    :type output_encoding: string
+
+    """
+
+    template_fields = ('r_command',)
+    template_ext = ('.r', '.R')
+    ui_color = '#C8D5E6'
+
+    @apply_defaults
+    def __init__(
+            self,
+            r_command,
+            output_encoding='utf-8',
+            *args, **kwargs):
+
+        super(ROperator, self).__init__(*args, **kwargs)
+        self.r_command = r_command
+        self.output_encoding = output_encoding
+
+    def execute(self, context):
+        """
+        Execute the R command or script in a temporary directory
+        """
+
+        with TemporaryDirectory(prefix='airflowtmp') as tmp_dir:
+            with NamedTemporaryFile(dir=tmp_dir, prefix=self.task_id) as f:
+
+                f.write(bytes(self.r_command, 'utf_8'))
+                f.flush()
+                fname = f.name
+                script_location = os.path.abspath(fname)
+
+                self.log.info("Temporary script location: %s", script_location)
+                self.log.info("Running command(s):\n%s", self.r_command)
+
+                try:
+                    res = robjects.r.source(fname, echo=False)
 
 Review comment:
   Just a comment with a question, is it uncommon to write R scripts that take 
arguments or use environment variables?
   
   The BashOperator has a similar limitation, but can be just flexible enough 
to take a script written to be called with arguments and have templated 
arguments used instead with the script as is.
   
   That is you can either write:
   ```python
   bash_command = 'some_command {{params.arg}}',  # this command is templated
   bash_command = 'script.sh',  # this file's contents are templated
   bash_command = 'script.sh {{params.arg}}', # this file is executable, it is 
not templated but passed a templated arg.
   ```
   
   This PR supports the first two cases for R only.
   
   People with pre-existing R scripts, that use arguments, will likely need to 
rewrite them with templating in mind or fall back to calling their R scripts 
with BashOperator and templated parameters. Alternatively this operator could 
have a r_args list parameter, but I don't really see a way that 
[source](https://stat.ethz.ch/R-manual/R-devel/library/base/html/source.html) 
can accept parameters.
   Also 
[commandArgs](https://stat.ethz.ch/R-manual/R-devel/library/base/html/commandArgs.html)
 doesn't seem to be something that could be setup for the source.
   
   If you agree that most new and existing users of R in Airflow would end up 
needing to rewrite their R scripts for this operator, and that's an issue, then 
maybe some this should be visited (maybe in a later PR really). And while doing 
that, it would seem that also taking an `env` dict for setting with 
[sys.setenv](https://stat.ethz.ch/R-manual/R-devel/library/base/html/Sys.setenv.html)
 could help in the on-boarding of R tasks into Airflow DAGs..

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to