[
https://issues.apache.org/jira/browse/SPARK-6816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14488237#comment-14488237
]
Shivaram Venkataraman commented on SPARK-6816:
----------------------------------------------
Comments from SparkR JIRA
Shivaram Venkataraman added a comment - 14/Feb/15 10:32 AM
I looked at this recently and I think the existing arguments to `sparkR.init`
pretty much cover all the options that are exposed in SparkConf.
We could split things out of the function arguments into a separate SparkConf
object (something like PySpark
https://github.com/apache/spark/blob/master/python/pyspark/conf.py) but the
setter-methods don't translate very well to the style we use in SparkR. For
example it would be something like setAppName(setMaster(conf, "local"),
"SparkR") instead of conf.setMaster().setAppName()
The other thing brought up by this JIRA is that we should parse arguments
passed to spark-submit or set in spark-defaults.conf. I think this should
automatically happen with SPARKR-178
Sun Rui Zongheng Yang Any thoughts on this ?
concretevitamin Zongheng Yang added a comment - 15/Feb/15 12:07 PM
I'm +1 on not using the builder pattern in R. What about using a named list or
an environment to simulate a SparkConf? For example, users can write something
like:
{code}
> conf <- list(spark.master = "local[2]", spark.executor.memory = "12g")
> conf
$spark.master
[1] "local[2]"
$spark.executor.memory
[1] "12g"
{code}
and pass the named list to `sparkR.init()`.
shivaram Shivaram Venkataraman added a comment - 15/Feb/15 5:50 PM
I think the named list might be okay, (one thing is that we will have nested
named lists for things like executorEnv). However I am not sure if named lists
are better than just passing named arguments to the `sparkR.init`. I guess the
better way to ask my question is what functionality do we want to provide to
the users –
Right now users can pretty much set anything they want in the SparkConf using
sparkR.init
One functionality that is missing is printing the conf and say inspecting what
config variables are set. We could say add a getConf(sc) which returns a named
list to provide this feature.
Is there any other functionality we need ?
concretevitamin Zongheng Yang added a comment - 21/Feb/15 3:22 PM
IMO using a named list provides more flexibility: it's ordinary data that users
can operate/transform on. Using only parameter-passing in the constructor locks
users in operating on code instead of data. It'd also be easier to just return
the saved named list if we're going to implement getConf()?
Some relevant discussions: https://aphyr.com/posts/321-builders-vs-option-maps
shivaram Shivaram Venkataraman added a comment - 22/Feb/15 4:33 PM
Hmm okay - named lists are not quite the same as option maps though.To move
forward it'll be good to see how the new API functions we want on the R side
should look like.
Lets keep this discussion open but I'm going to change the priority /
description (we are already able to read in spark-defaults.conf now that
SPARKR-178 has been merged).
> Add SparkConf API to configure SparkR
> -------------------------------------
>
> Key: SPARK-6816
> URL: https://issues.apache.org/jira/browse/SPARK-6816
> Project: Spark
> Issue Type: New Feature
> Components: SparkR
> Reporter: Shivaram Venkataraman
> Priority: Minor
>
> Right now the only way to configure SparkR is to pass in arguments to
> sparkR.init. The goal is to add an API similar to SparkConf on Scala/Python
> to make configuration easier
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]