[ https://issues.apache.org/jira/browse/SPARK-13573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175079#comment-15175079 ]
Chip Senkbeil commented on SPARK-13573: --------------------------------------- [~sunrui], IIRC, Toree supported SparkR from 1.4.x and 1.5.x. Just a bit of a pain to keep in sync. So, the process Toree uses the methods to interact with SparkR is as follows: # We added a SparkR.connect method (https://github.com/apache/incubator-toree/blob/master/sparkr-interpreter/src/main/resources/R/pkg/R/sparkR.R#L220) that uses the EXISTING_SPARKR_BACKEND_PORT to connect to an R backend but does not attempt to initialize the Spark Context # We use the exposed callJStatic to acquire a reference to a Java (well, Scala) object that has additional variables like the Spark Context hanging off of it (https://github.com/apache/incubator-toree/blob/master/sparkr-interpreter/src/main/resources/kernelR/sparkr_runner.R#L50) {code}# Retrieve the bridge used to perform actions on the JVM bridge <- callJStatic( "org.apache.toree.kernel.interpreter.sparkr.SparkRBridge", "sparkRBridge" ) # Retrieve the state used to pull code off the JVM and push results back state <- callJMethod(bridge, "state") # Acquire the kernel API instance to expose kernel <- callJMethod(bridge, "kernel") assign("kernel", kernel, .runnerEnv){code} # We then invoke methods using callJMethod to get the next string of R code to evaluate {code}# Load the conainer of the code codeContainer <- callJMethod(state, "nextCode") # If not valid result, wait 1 second and try again if (!class(codeContainer) == "jobj") { Sys.sleep(1) next() } # Retrieve the code id (for response) and code codeId <- callJMethod(codeContainer, "codeId") code <- callJMethod(codeContainer, "code"){code} # Finally, we evaluate the acquired code string and send the results back to our running JVM (which represents a Jupyter kernel) {code} # Parse the code into an expression to be evaluated codeExpr <- parse(text = code) print(paste("Code expr", codeExpr)) tryCatch({ # Evaluate the code provided and capture the result as a string result <- capture.output(eval(codeExpr, envir = .runnerEnv)) print(paste("Result type", class(result), length(result))) print(paste("Success", codeId, result)) # Mark the execution as a success and send back the result # If output is null/empty, ensure that we can send it (otherwise fails) if (is.null(result) || length(result) <= 0) { print("Marking success with no output") callJMethod(state, "markSuccess", codeId) } else { # Clean the result before sending it back cleanedResult <- trimws(flatten(result, shouldTrim = FALSE)) print(paste("Marking success with output:", cleanedResult)) callJMethod(state, "markSuccess", codeId, cleanedResult) } }, error = function(ex) { # Mark the execution as a failure and send back the error print(paste("Failure", codeId, toString(ex))) callJMethod(state, "markFailure", codeId, toString(ex)) }){code} > Open SparkR APIs (R package) to allow better 3rd party usage > ------------------------------------------------------------ > > Key: SPARK-13573 > URL: https://issues.apache.org/jira/browse/SPARK-13573 > Project: Spark > Issue Type: Improvement > Components: SparkR > Reporter: Chip Senkbeil > > Currently, SparkR's R package does not expose enough of its APIs to be used > flexibly. That I am aware of, SparkR still requires you to create a new > SparkContext by invoking the sparkR.init method (so you cannot connect to a > running one) and there is no way to invoke custom Java methods using the > exposed SparkR API (unlike PySpark). > We currently maintain a fork of SparkR that is used to power the R > implementation of Apache Toree, which is a gateway to use Apache Spark. This > fork provides a connect method (to use an existing Spark Context), exposes > needed methods like invokeJava (to be able to communicate with our JVM to > retrieve code to run, etc), and uses reflection to access > org.apache.spark.api.r.RBackend. > Here is the documentation I recorded regarding changes we need to enable > SparkR as an option for Apache Toree: > https://github.com/apache/incubator-toree/tree/master/sparkr-interpreter/src/main/resources -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org