I need to make an R environment available where the
SparkSession/SparkContext needs to be setup a specific way. The user simply
accesses this environment and executes his/her code. If the user code does
not access any Spark functions, I do not want to create a SparkContext
unnecessarily.

In Scala/Python environments, the user can't access spark without first
referencing SparkContext / SparkSession classes. So the above (lazy and/or
custom SparkSession/Context creation) is easily met by offering
sparkContext/sparkSession handles to the user that are either wrappers on
Spark's classes or have lazy evaluation semantics. This way only when the
user accesses these handles to sparkContext/Session will the
SparkSession/Context actually get set up without the user needing to know
all the details about initing the SparkContext/Session.

However, achieving the same doesn't appear to be so straightforward in R.
>From what I see, executing sparkR.session(...) sets up private variables in
SparkR:::.sparkREnv (.sparkRjsc , .sparkRsession). The way SparkR api
works, a user doesn't need a handle to the spark session as such. Executing
functions like so:  "df <- as.DataFrame(..)" implicitly access the private
vars in SparkR:::.sparkREnv to get access to the sparkContext etc that are
expected to have been created by a prior call to
sparkR.session()/sparkR.init() etc.

Therefore, to inject any custom/lazy behavior into this I don't see a way
except through having my code (that sits outside of Spark) apply a
delayedAssign() or a makeActiveBinding( ) on SparkR:::.sparkRsession /
.sparkRjsc  variables. This way when spark code internally references them,
my wrapper/lazy code gets executed to do whatever I need done.

However, I am seeing some limitations of applying even this approach to
SparkR - it will not work unless some minor changes are made in the SparkR
code. But, before I opened a PR that would do these changes in SparkR I
wanted to check if there was a better way to achieve this? I am far less
than an R expert, and could be missing something here.

If you'd rather see this in a JIRA and a PR, let me know and I'll go ahead
and open one.

Regards,
Vin.

Reply via email to