[
https://issues.apache.org/jira/browse/SPARK-31918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17142127#comment-17142127
]
Hyukjin Kwon commented on SPARK-31918:
--------------------------------------
Just to share what I investigated:
Seems the problem relates to {{processClosure}} via {{cleanClosure}} in SparkR.
Looks like there is a problem [when the new environment is set to a
function|https://github.com/apache/spark/blob/master/R/pkg/R/utils.R#L601]
especially that includes generic S4 functions, given my observation.
So, for example, if you skip it with the fix below:
{code:java}
diff --git a/R/pkg/R/utils.R b/R/pkg/R/utils.R
index 65db9c21d9d..60cad588f5e 100644
--- a/R/pkg/R/utils.R
+++ b/R/pkg/R/utils.R
@@ -529,7 +529,9 @@ processClosure <- function(node, oldEnv, defVars,
checkedFuncs, newEnv) {
# Namespaces other than "SparkR" will not be searched.
if (!isNamespace(func.env) ||
(getNamespaceName(func.env) == "SparkR" &&
- !(nodeChar %in% getNamespaceExports("SparkR")))) {
+ !(nodeChar %in% getNamespaceExports("SparkR")) &&
+ # Skip all generics under SparkR - R 4.0.0 looks having an
issue.
+ !isGeneric(nodeChar, func.env))) {
{code}
{code:java}
* checking re-building of vignette outputs ... OK
{code}
CRAN check passes with the current master branch in my local
For a minimal reproducer, with this diff:
{code:java}
diff --git a/R/pkg/R/RDD.R b/R/pkg/R/RDD.R
index 7a1d157bb8a..89250c37319 100644
--- a/R/pkg/R/RDD.R
+++ b/R/pkg/R/RDD.R
@@ -487,6 +487,7 @@ setMethod("lapply",
func <- function(partIndex, part) {
lapply(part, FUN)
}
+ print(SparkR:::cleanClosure(func)(1, 2))
lapplyPartitionsWithIndex(X, func)
})
{code}
run:
{code:java}
createDataFrame(lapply(seq(100), function (e) list(value=e)))
{code}
When {{lapply}} is called against the RDD at {{createDataFrame}}, the cleaned
closure's environment has SparkR's lapply as a S4 method and it leads to the
error such as {{attempt to bind a variable to R_UnboundValue}}.
Hopefully this is the cause of the issue happening here, and not an issue in my
env. cc [~felixcheung], [~dongjoon] FYI.
> SparkR CRAN check gives a warning with R 4.0.0 on OSX
> -----------------------------------------------------
>
> Key: SPARK-31918
> URL: https://issues.apache.org/jira/browse/SPARK-31918
> Project: Spark
> Issue Type: Bug
> Components: SparkR
> Affects Versions: 2.4.6, 3.0.0
> Reporter: Shivaram Venkataraman
> Priority: Major
>
> When the SparkR package is run through a CRAN check (i.e. with something like
> R CMD check --as-cran ~/Downloads/SparkR_2.4.6.tar.gz), we rebuild the SparkR
> vignette as a part of the checks.
> However this seems to be failing with R 4.0.0 on OSX -- both on my local
> machine and on CRAN
> https://cran.r-project.org/web/checks/check_results_SparkR.html
> cc [~felixcheung]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]