[ 
https://issues.apache.org/jira/browse/SPARK-31918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17142127#comment-17142127
 ] 

Hyukjin Kwon commented on SPARK-31918:
--------------------------------------

Just to share what I investigated:

Seems the problem relates to {{processClosure}} via {{cleanClosure}} in SparkR.
 Looks like there is a problem [when the new environment is set to a 
function|https://github.com/apache/spark/blob/master/R/pkg/R/utils.R#L601] 
especially that includes generic S4 functions, given my observation.
 So, for example, if you skip it with the fix below:
{code:java}
diff --git a/R/pkg/R/utils.R b/R/pkg/R/utils.R
index 65db9c21d9d..60cad588f5e 100644
--- a/R/pkg/R/utils.R
+++ b/R/pkg/R/utils.R
@@ -529,7 +529,9 @@ processClosure <- function(node, oldEnv, defVars, 
checkedFuncs, newEnv) {
         # Namespaces other than "SparkR" will not be searched.
         if (!isNamespace(func.env) ||
             (getNamespaceName(func.env) == "SparkR" &&
-               !(nodeChar %in% getNamespaceExports("SparkR")))) {
+               !(nodeChar %in% getNamespaceExports("SparkR")) &&
+                  # Skip all generics under SparkR - R 4.0.0 looks having an 
issue.
+                  !isGeneric(nodeChar, func.env))) {
{code}
{code:java}
* checking re-building of vignette outputs ... OK
{code}
CRAN check passes with the current master branch in my local

For a minimal reproducer, with this diff:
{code:java}
diff --git a/R/pkg/R/RDD.R b/R/pkg/R/RDD.R
index 7a1d157bb8a..89250c37319 100644
--- a/R/pkg/R/RDD.R
+++ b/R/pkg/R/RDD.R
@@ -487,6 +487,7 @@ setMethod("lapply",
             func <- function(partIndex, part) {
               lapply(part, FUN)
             }
+            print(SparkR:::cleanClosure(func)(1, 2))
             lapplyPartitionsWithIndex(X, func)
           })
{code}
run:
{code:java}
createDataFrame(lapply(seq(100), function (e) list(value=e)))
{code}
When {{lapply}} is called against the RDD at {{createDataFrame}}, the cleaned 
closure's environment has SparkR's lapply as a S4 method and it leads to the 
error such as {{attempt to bind a variable to R_UnboundValue}}.

Hopefully this is the cause of the issue happening here, and not an issue in my 
env. cc [~felixcheung], [~dongjoon] FYI.

> SparkR CRAN check gives a warning with R 4.0.0 on OSX
> -----------------------------------------------------
>
>                 Key: SPARK-31918
>                 URL: https://issues.apache.org/jira/browse/SPARK-31918
>             Project: Spark
>          Issue Type: Bug
>          Components: SparkR
>    Affects Versions: 2.4.6, 3.0.0
>            Reporter: Shivaram Venkataraman
>            Priority: Major
>
> When the SparkR package is run through a CRAN check (i.e. with something like 
> R CMD check --as-cran ~/Downloads/SparkR_2.4.6.tar.gz), we rebuild the SparkR 
> vignette as a part of the checks.
> However this seems to be failing with R 4.0.0 on OSX -- both on my local 
> machine and on CRAN 
> https://cran.r-project.org/web/checks/check_results_SparkR.html
> cc [~felixcheung]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to