[
https://issues.apache.org/jira/browse/SPARK-13573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175079#comment-15175079
]
Chip Senkbeil commented on SPARK-13573:
---------------------------------------
[~sunrui], IIRC, Toree supported SparkR from 1.4.x and 1.5.x. Just a bit of a
pain to keep in sync.
So, the process Toree uses the methods to interact with SparkR is as follows:
# We added a SparkR.connect method
(https://github.com/apache/incubator-toree/blob/master/sparkr-interpreter/src/main/resources/R/pkg/R/sparkR.R#L220)
that uses the EXISTING_SPARKR_BACKEND_PORT to connect to an R backend but does
not attempt to initialize the Spark Context
# We use the exposed callJStatic to acquire a reference to a Java (well, Scala)
object that has additional variables like the Spark Context hanging off of it
(https://github.com/apache/incubator-toree/blob/master/sparkr-interpreter/src/main/resources/kernelR/sparkr_runner.R#L50)
{code}# Retrieve the bridge used to perform actions on the JVM
bridge <- callJStatic(
"org.apache.toree.kernel.interpreter.sparkr.SparkRBridge", "sparkRBridge"
)
# Retrieve the state used to pull code off the JVM and push results back
state <- callJMethod(bridge, "state")
# Acquire the kernel API instance to expose
kernel <- callJMethod(bridge, "kernel")
assign("kernel", kernel, .runnerEnv){code}
# We then invoke methods using callJMethod to get the next string of R code to
evaluate {code}# Load the conainer of the code
codeContainer <- callJMethod(state, "nextCode")
# If not valid result, wait 1 second and try again
if (!class(codeContainer) == "jobj") {
Sys.sleep(1)
next()
}
# Retrieve the code id (for response) and code
codeId <- callJMethod(codeContainer, "codeId")
code <- callJMethod(codeContainer, "code"){code}
# Finally, we evaluate the acquired code string and send the results back to
our running JVM (which represents a Jupyter kernel) {code} # Parse the code
into an expression to be evaluated
codeExpr <- parse(text = code)
print(paste("Code expr", codeExpr))
tryCatch({
# Evaluate the code provided and capture the result as a string
result <- capture.output(eval(codeExpr, envir = .runnerEnv))
print(paste("Result type", class(result), length(result)))
print(paste("Success", codeId, result))
# Mark the execution as a success and send back the result
# If output is null/empty, ensure that we can send it (otherwise fails)
if (is.null(result) || length(result) <= 0) {
print("Marking success with no output")
callJMethod(state, "markSuccess", codeId)
} else {
# Clean the result before sending it back
cleanedResult <- trimws(flatten(result, shouldTrim = FALSE))
print(paste("Marking success with output:", cleanedResult))
callJMethod(state, "markSuccess", codeId, cleanedResult)
}
}, error = function(ex) {
# Mark the execution as a failure and send back the error
print(paste("Failure", codeId, toString(ex)))
callJMethod(state, "markFailure", codeId, toString(ex))
}){code}
> Open SparkR APIs (R package) to allow better 3rd party usage
> ------------------------------------------------------------
>
> Key: SPARK-13573
> URL: https://issues.apache.org/jira/browse/SPARK-13573
> Project: Spark
> Issue Type: Improvement
> Components: SparkR
> Reporter: Chip Senkbeil
>
> Currently, SparkR's R package does not expose enough of its APIs to be used
> flexibly. That I am aware of, SparkR still requires you to create a new
> SparkContext by invoking the sparkR.init method (so you cannot connect to a
> running one) and there is no way to invoke custom Java methods using the
> exposed SparkR API (unlike PySpark).
> We currently maintain a fork of SparkR that is used to power the R
> implementation of Apache Toree, which is a gateway to use Apache Spark. This
> fork provides a connect method (to use an existing Spark Context), exposes
> needed methods like invokeJava (to be able to communicate with our JVM to
> retrieve code to run, etc), and uses reflection to access
> org.apache.spark.api.r.RBackend.
> Here is the documentation I recorded regarding changes we need to enable
> SparkR as an option for Apache Toree:
> https://github.com/apache/incubator-toree/tree/master/sparkr-interpreter/src/main/resources
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]