[
https://issues.apache.org/jira/browse/SPARK-41937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vivek Atal updated SPARK-41937:
-------------------------------
Description:
Base R 4.2.0 introduced a change ([[Rd] R 4.2.0 is
released|https://stat.ethz.ch/pipermail/r-announce/2022/000683.html]),
"{{{}Calling if() or while() with a condition of length greater than one gives
an error rather than a warning.{}}}"
The below code is a reproducible example of the issue. If it is executed in R
>=4.2.0 then it will generate an error, or else just a warning message.
`{{{}Sys.time()`{}}} is a multi-class object in R, and throughout the Spark R
repository '{{{}if{}}}' statement is used as: `{{{}if(class(x) ==
"Column"){}}}` - this causes error in the latest R version >= 4.2.0. Note that
R allows an object to have multiple '{{{}class{}}}' names as a character vector
([R: Object
Classes|https://stat.ethz.ch/R-manual/R-devel/library/base/html/class.html]);
hence this type of check itself was not a good idea in the first place.
The below chunks are executed on R version 4.1.3.
{code:java}
{
SparkR::sparkR.session()
t <- Sys.time()
sdf <- SparkR::createDataFrame(data.frame(x = t + c(-1, 1, -1, 1, -1)))
SparkR::collect(SparkR::filter(sdf, SparkR::column('x') > t))
}
#> Warning in if (class(e2) == 'Column') {: the condition has length > 1
#> and only the first element will be used
#> x
#> 1 2023-01-07 20:40:20
#> 2 2023-01-07 20:40:20
{code}
{code:java}
{
Sys.setenv(`_R_CHECK_LENGTH_1_CONDITION_` = "true")
SparkR::sparkR.session()
t <- Sys.time()
sdf <- SparkR::createDataFrame(data.frame(x = t + c(-1, 1, -1, 1, -1)))
SparkR::collect(SparkR::filter(sdf, SparkR::column('x') > t))
}
#> Error in h(simpleError(msg, call)): error in evaluating the argument 'x'
#> in selecting a method for function 'collect': error in evaluating the
#> argument 'condition' in selecting a method for function 'filter': the
#> condition has length > 1 {code}
Similar issue is noted for these SparkR functions where {{Sys.time()}} type of
multi-class data might be used: {{lit, fillna, when, otherwise, contains,
ifelse }}
The suggested change is to add the `{{{}all{}}}` function (or `{{{}any{}}}`, as
appropriate) while doing the check of whether `{{{}class(.){}}}` is
`{{{}Column{}}}` or not: `{{{}if(all(class(.) == "Column")){}}}`. Or, better to
use `{{{}base::inherits{}}}` for this check as `{{{}if(inherits(.,
"Column")){}}}`.
was:
Base R 4.2.0 introduced a change ([[Rd] R 4.2.0 is
released|https://stat.ethz.ch/pipermail/r-announce/2022/000683.html]),
"{{{}Calling if() or while() with a condition of length greater than one gives
an error rather than a warning.{}}}"
The below code is a reproducible example of the issue. If it is executed in R
>=4.2.0 then it will generate an error, or else just a warning message.
`{{{}Sys.time()`{}}} is a multi-class object in R, and throughout the Spark R
repository '{{{}if{}}}' statement is used as: `{{{}if(class(x) ==
"Column"){}}}` - this causes error in the latest R version >= 4.2.0. Note that
R allows an object to have multiple '{{{}class{}}}' names as a character vector
([R: Object
Classes|https://stat.ethz.ch/R-manual/R-devel/library/base/html/class.html]);
hence this type of check itself was not a good idea in the first place.
The below chunks are executed on R version 4.1.3.
{code:java}
{
SparkR::sparkR.session()
t <- Sys.time()
sdf <- SparkR::createDataFrame(data.frame(x = t + c(-1, 1, -1, 1, -1)))
SparkR::collect(SparkR::filter(sdf, SparkR::column('x') > t))
}
#> Warning in if (class(e2) == 'Column') {: the condition has length > 1
#> and only the first element will be used
#> x
#> 1 2023-01-07 20:40:20
#> 2 2023-01-07 20:40:20
{code}
{code:java}
{
Sys.setenv(`_R_CHECK_LENGTH_1_CONDITION_` = "true")
SparkR::sparkR.session()
t <- Sys.time()
sdf <- SparkR::createDataFrame(data.frame(x = t + c(-1, 1, -1, 1, -1)))
SparkR::collect(SparkR::filter(sdf, SparkR::column('x') > t))
}
#> Error in h(simpleError(msg, call)): error in evaluating the argument 'x'
#> in selecting a method for function 'collect': error in evaluating the
#> argument 'condition' in selecting a method for function 'filter': the
#> condition has length > 1 {code}
Similar issue is noted for these SparkR functions where {{Sys.time()}} type of
multi-class data might be used: {{lit, fillna, when, otherwise, contains,
ifelse }}
The suggested change is to add the `{{{}all{}}}` function (or `{{{}any{}}}`, as
appropriate) while doing the check of whether `{{{}class(.){}}}` is
`{{{}Column{}}}` or not: `{{{}if(all(class(.) == "Column")){}}}`. Or, better to
use `base::inherits` for this check as `if(inherits(., "Column"))`.
> SparkR datetime column compare with Sys.time() throws error in R (>= 4.2.0)
> ---------------------------------------------------------------------------
>
> Key: SPARK-41937
> URL: https://issues.apache.org/jira/browse/SPARK-41937
> Project: Spark
> Issue Type: Bug
> Components: R, SparkR
> Affects Versions: 3.3.0
> Reporter: Vivek Atal
> Priority: Minor
> Labels: newbie
>
> Base R 4.2.0 introduced a change ([[Rd] R 4.2.0 is
> released|https://stat.ethz.ch/pipermail/r-announce/2022/000683.html]),
> "{{{}Calling if() or while() with a condition of length greater than one
> gives an error rather than a warning.{}}}"
> The below code is a reproducible example of the issue. If it is executed in R
> >=4.2.0 then it will generate an error, or else just a warning message.
> `{{{}Sys.time()`{}}} is a multi-class object in R, and throughout the Spark R
> repository '{{{}if{}}}' statement is used as: `{{{}if(class(x) ==
> "Column"){}}}` - this causes error in the latest R version >= 4.2.0. Note
> that R allows an object to have multiple '{{{}class{}}}' names as a character
> vector ([R: Object
> Classes|https://stat.ethz.ch/R-manual/R-devel/library/base/html/class.html]);
> hence this type of check itself was not a good idea in the first place.
> The below chunks are executed on R version 4.1.3.
> {code:java}
> {
> SparkR::sparkR.session()
> t <- Sys.time()
> sdf <- SparkR::createDataFrame(data.frame(x = t + c(-1, 1, -1, 1, -1)))
> SparkR::collect(SparkR::filter(sdf, SparkR::column('x') > t))
> }
> #> Warning in if (class(e2) == 'Column') {: the condition has length > 1
> #> and only the first element will be used
> #> x
> #> 1 2023-01-07 20:40:20
> #> 2 2023-01-07 20:40:20
> {code}
>
>
> {code:java}
> {
> Sys.setenv(`_R_CHECK_LENGTH_1_CONDITION_` = "true")
> SparkR::sparkR.session()
> t <- Sys.time()
> sdf <- SparkR::createDataFrame(data.frame(x = t + c(-1, 1, -1, 1, -1)))
> SparkR::collect(SparkR::filter(sdf, SparkR::column('x') > t))
> }
> #> Error in h(simpleError(msg, call)): error in evaluating the argument 'x'
> #> in selecting a method for function 'collect': error in evaluating the
> #> argument 'condition' in selecting a method for function 'filter': the
> #> condition has length > 1 {code}
>
> Similar issue is noted for these SparkR functions where {{Sys.time()}} type
> of multi-class data might be used: {{lit, fillna, when, otherwise, contains,
> ifelse }}
> The suggested change is to add the `{{{}all{}}}` function (or `{{{}any{}}}`,
> as appropriate) while doing the check of whether `{{{}class(.){}}}` is
> `{{{}Column{}}}` or not: `{{{}if(all(class(.) == "Column")){}}}`. Or, better
> to use `{{{}base::inherits{}}}` for this check as `{{{}if(inherits(.,
> "Column")){}}}`.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]