[jira] [Commented] (SPARK-13573) Open SparkR APIs (R package) to allow better 3rd party usage

Chip Senkbeil (JIRA) Tue, 01 Mar 2016 21:45:46 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-13573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175079#comment-15175079
 ]


Chip Senkbeil commented on SPARK-13573:
---------------------------------------

[~sunrui], IIRC, Toree supported SparkR from 1.4.x and 1.5.x. Just a bit of a 
pain to keep in sync.

So, the process Toree uses the methods to interact with SparkR is as follows:

# We added a SparkR.connect method 
(https://github.com/apache/incubator-toree/blob/master/sparkr-interpreter/src/main/resources/R/pkg/R/sparkR.R#L220)
 that uses the EXISTING_SPARKR_BACKEND_PORT to connect to an R backend but does 
not attempt to initialize the Spark Context
# We use the exposed callJStatic to acquire a reference to a Java (well, Scala) 
object that has additional variables like the Spark Context hanging off of it 
(https://github.com/apache/incubator-toree/blob/master/sparkr-interpreter/src/main/resources/kernelR/sparkr_runner.R#L50)
 {code}# Retrieve the bridge used to perform actions on the JVM
bridge <- callJStatic(
  "org.apache.toree.kernel.interpreter.sparkr.SparkRBridge", "sparkRBridge"
)

# Retrieve the state used to pull code off the JVM and push results back
state <- callJMethod(bridge, "state")

# Acquire the kernel API instance to expose
kernel <- callJMethod(bridge, "kernel")
assign("kernel", kernel, .runnerEnv){code}
# We then invoke methods using callJMethod to get the next string of R code to 
evaluate {code}# Load the conainer of the code
  codeContainer <- callJMethod(state, "nextCode")

  # If not valid result, wait 1 second and try again
  if (!class(codeContainer) == "jobj") {
    Sys.sleep(1)
    next()
  }

  # Retrieve the code id (for response) and code
  codeId <- callJMethod(codeContainer, "codeId")
  code <- callJMethod(codeContainer, "code"){code}
# Finally, we evaluate the acquired code string and send the results back to 
our running JVM (which represents a Jupyter kernel) {code}  # Parse the code 
into an expression to be evaluated
  codeExpr <- parse(text = code)
  print(paste("Code expr", codeExpr))

  tryCatch({
    # Evaluate the code provided and capture the result as a string
    result <- capture.output(eval(codeExpr, envir = .runnerEnv))
    print(paste("Result type", class(result), length(result)))
    print(paste("Success", codeId, result))

    # Mark the execution as a success and send back the result
    # If output is null/empty, ensure that we can send it (otherwise fails)
    if (is.null(result) || length(result) <= 0) {
      print("Marking success with no output")
      callJMethod(state, "markSuccess", codeId)
    } else {
      # Clean the result before sending it back
      cleanedResult <- trimws(flatten(result, shouldTrim = FALSE))

      print(paste("Marking success with output:", cleanedResult))
      callJMethod(state, "markSuccess", codeId, cleanedResult)
    }
  }, error = function(ex) {
    # Mark the execution as a failure and send back the error
    print(paste("Failure", codeId, toString(ex)))
    callJMethod(state, "markFailure", codeId, toString(ex))
  }){code}

> Open SparkR APIs (R package) to allow better 3rd party usage
> ------------------------------------------------------------
>
>                 Key: SPARK-13573
>                 URL: https://issues.apache.org/jira/browse/SPARK-13573
>             Project: Spark
>          Issue Type: Improvement
>          Components: SparkR
>            Reporter: Chip Senkbeil
>
> Currently, SparkR's R package does not expose enough of its APIs to be used 
> flexibly. That I am aware of, SparkR still requires you to create a new 
> SparkContext by invoking the sparkR.init method (so you cannot connect to a 
> running one) and there is no way to invoke custom Java methods using the 
> exposed SparkR API (unlike PySpark).
> We currently maintain a fork of SparkR that is used to power the R 
> implementation of Apache Toree, which is a gateway to use Apache Spark. This 
> fork provides a connect method (to use an existing Spark Context), exposes 
> needed methods like invokeJava (to be able to communicate with our JVM to 
> retrieve code to run, etc), and uses reflection to access 
> org.apache.spark.api.r.RBackend.
> Here is the documentation I recorded regarding changes we need to enable 
> SparkR as an option for Apache Toree: 
> https://github.com/apache/incubator-toree/tree/master/sparkr-interpreter/src/main/resources



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13573) Open SparkR APIs (R package) to allow better 3rd party usage

Reply via email to