spark git commit: [SPARK-6812] [SPARKR] filter() on DataFrame does not work as expected.

shivaram Wed, 06 May 2015 22:49:13 -0700

Repository: spark
Updated Branches:
  refs/heads/branch-1.4 fb4967b5f -> 4948f42e7



[SPARK-6812] [SPARKR] filter() on DataFrame does not work as expected.

According to the R manual: 
https://stat.ethz.ch/R-manual/R-devel/library/base/html/Startup.html,
" if a function .First is found on the search path, it is executed as .First(). 
Finally, function .First.sys() in the base package is run. This calls require 
to attach the default packages specified by options("defaultPackages")."
In .First() in profile/shell.R, we load SparkR package. This means SparkR 
package is loaded before default packages. If there are same names in default 
packages, they will overwrite those in SparkR. This is why filter() in SparkR 
is masked by filter() in stats, which is usually in the default package list.
We need to make sure SparkR is loaded after default packages. The solution is 
to append SparkR to default packages, instead of loading SparkR in .First().

BTW, I'd like to discuss our policy on how to solve name conflict. Previously, 
we rename API names from Scala API if there is name conflict with base or other 
commonly-used packages. However, from long term perspective, this is not good 
for API stability, because we can't predict name conflicts, for example, if in 
the future a name added in base package conflicts with an API in SparkR? So the 
better policy is to keep API name same as Scala's without worrying about name 
conflicts. When users use SparkR, they should load SparkR as last package, so 
that all API names are effective. Use can explicitly use :: to refer to hidden 
names from other packages. If we agree on this, I can submit a JIRA issue to 
change back some rename API methods, for example, DataFrame.sortDF().

Author: Sun Rui <[email protected]>

Closes #5938 from sun-rui/SPARK-6812 and squashes the following commits:

b569145 [Sun Rui] [SPARK-6812][SparkR] filter() on DataFrame does not work as 
expected.

(cherry picked from commit 9cfa9a516ed991de6c5900c7285b47380a396142)
Signed-off-by: Shivaram Venkataraman <[email protected]>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4948f42e
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4948f42e
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4948f42e

Branch: refs/heads/branch-1.4
Commit: 4948f42e7940448e5c06e5e0c964aa336d17fd5d
Parents: fb4967b
Author: Sun Rui <[email protected]>
Authored: Wed May 6 22:48:16 2015 -0700
Committer: Shivaram Venkataraman <[email protected]>
Committed: Wed May 6 22:48:38 2015 -0700

----------------------------------------------------------------------
 R/pkg/inst/profile/shell.R | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/4948f42e/R/pkg/inst/profile/shell.R
----------------------------------------------------------------------
diff --git a/R/pkg/inst/profile/shell.R b/R/pkg/inst/profile/shell.R
index 7a7f203..33478d9 100644
--- a/R/pkg/inst/profile/shell.R
+++ b/R/pkg/inst/profile/shell.R
@@ -20,11 +20,13 @@
   .libPaths(c(file.path(home, "R", "lib"), .libPaths()))
   Sys.setenv(NOAWT=1)
 
-  library(utils)
-  library(SparkR)
-  sc <- sparkR.init(Sys.getenv("MASTER", unset = ""))
+  # Make sure SparkR package is the last loaded one
+  old <- getOption("defaultPackages")
+  options(defaultPackages = c(old, "SparkR"))
+
+  sc <- SparkR::sparkR.init(Sys.getenv("MASTER", unset = ""))
   assign("sc", sc, envir=.GlobalEnv)
-  sqlCtx <- sparkRSQL.init(sc)
+  sqlCtx <- SparkR::sparkRSQL.init(sc)
   assign("sqlCtx", sqlCtx, envir=.GlobalEnv)
   cat("\n Welcome to SparkR!")
   cat("\n Spark context is available as sc, SQL context is available as 
sqlCtx\n")


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

spark git commit: [SPARK-6812] [SPARKR] filter() on DataFrame does not work as expected.

Reply via email to