ianmcook commented on a change in pull request #10751:
URL: https://github.com/apache/arrow/pull/10751#discussion_r672619775
##########
File path: r/R/dplyr-functions.R
##########
@@ -57,6 +57,30 @@ nse_funcs$cast <- function(x, target_type, safe = TRUE, ...)
{
Expression$create("cast", x, options = opts)
}
+nse_funcs$coalesce <- function(...) {
+ if (missing(..1)) {
+ abort("At least one argument must be supplied to coalesce()")
+ }
+ # TODO: if an option is added to the coalesce kernel to treat NaN as NA,
+ # use that to simplify the code here (ARROW-13389)
+ args <- lapply(list2(...), function(arg) {
+ if (!inherits(arg, "Expression")) {
+ arg <- Expression$scalar(arg)
+ }
+ if (arg$type_id() %in% TYPES_WITH_NAN) {
+ # replace NaN with NA, using Arrow's smallest float type to avoid casting
+ # smaller float types to larger float types
Review comment:
This approach is flawed because it replaces `NaN` with `NA` in the
result when all the values are `NaN`
##########
File path: r/R/dplyr-functions.R
##########
@@ -57,6 +57,30 @@ nse_funcs$cast <- function(x, target_type, safe = TRUE, ...)
{
Expression$create("cast", x, options = opts)
}
+nse_funcs$coalesce <- function(...) {
+ if (missing(..1)) {
+ abort("At least one argument must be supplied to coalesce()")
+ }
+ # TODO: if an option is added to the coalesce kernel to treat NaN as NA,
+ # use that to simplify the code here (ARROW-13389)
+ args <- lapply(list2(...), function(arg) {
+ if (!inherits(arg, "Expression")) {
+ arg <- Expression$scalar(arg)
+ }
+ if (arg$type_id() %in% TYPES_WITH_NAN) {
+ # replace NaN with NA, using Arrow's smallest float type to avoid casting
+ # smaller float types to larger float types
Review comment:
Oh, I think the solution is to skip the last iteration of the `lapply`
##########
File path: r/R/dplyr-functions.R
##########
@@ -57,6 +57,30 @@ nse_funcs$cast <- function(x, target_type, safe = TRUE, ...)
{
Expression$create("cast", x, options = opts)
}
+nse_funcs$coalesce <- function(...) {
+ if (missing(..1)) {
+ abort("At least one argument must be supplied to coalesce()")
+ }
+ # TODO: if an option is added to the coalesce kernel to treat NaN as NA,
+ # use that to simplify the code here (ARROW-13389)
+ args <- lapply(list2(...), function(arg) {
+ if (!inherits(arg, "Expression")) {
+ arg <- Expression$scalar(arg)
+ }
+ if (arg$type_id() %in% TYPES_WITH_NAN) {
+ # replace NaN with NA, using Arrow's smallest float type to avoid casting
+ # smaller float types to larger float types
Review comment:
Fixed in 3a91ee9
##########
File path: r/R/dplyr-functions.R
##########
@@ -57,6 +57,30 @@ nse_funcs$cast <- function(x, target_type, safe = TRUE, ...)
{
Expression$create("cast", x, options = opts)
}
+nse_funcs$coalesce <- function(...) {
+ if (missing(..1)) {
+ abort("At least one argument must be supplied to coalesce()")
+ }
+ # TODO: if an option is added to the coalesce kernel to treat NaN as NA,
+ # use that to simplify the code here (ARROW-13389)
+ args <- lapply(list2(...), function(arg) {
+ if (!inherits(arg, "Expression")) {
+ arg <- Expression$scalar(arg)
+ }
+ if (arg$type_id() %in% TYPES_WITH_NAN) {
+ # replace NaN with NA, using Arrow's smallest float type to avoid casting
+ # smaller float types to larger float types
Review comment:
Fixed in 3a91ee9. I used an attribute named `last` to keep track of
which iteration is the final one.
##########
File path: r/R/dplyr-functions.R
##########
@@ -57,6 +57,30 @@ nse_funcs$cast <- function(x, target_type, safe = TRUE, ...)
{
Expression$create("cast", x, options = opts)
}
+nse_funcs$coalesce <- function(...) {
+ if (missing(..1)) {
+ abort("At least one argument must be supplied to coalesce()")
+ }
+ # TODO: if an option is added to the coalesce kernel to treat NaN as NA,
+ # use that to simplify the code here (ARROW-13389)
+ args <- lapply(list2(...), function(arg) {
+ if (!inherits(arg, "Expression")) {
+ arg <- Expression$scalar(arg)
+ }
+ if (arg$type_id() %in% TYPES_WITH_NAN) {
+ # replace NaN with NA, using Arrow's smallest float type to avoid casting
+ # smaller float types to larger float types
Review comment:
Fixed now. I used an attribute named `last` to keep track of which
iteration is the final one.
##########
File path: r/R/dplyr-functions.R
##########
@@ -57,6 +57,30 @@ nse_funcs$cast <- function(x, target_type, safe = TRUE, ...)
{
Expression$create("cast", x, options = opts)
}
+nse_funcs$coalesce <- function(...) {
+ if (missing(..1)) {
+ abort("At least one argument must be supplied to coalesce()")
+ }
+ # TODO: if an option is added to the coalesce kernel to treat NaN as NA,
+ # use that to simplify the code here (ARROW-13389)
+ args <- lapply(list2(...), function(arg) {
+ if (!inherits(arg, "Expression")) {
+ arg <- Expression$scalar(arg)
+ }
+ if (arg$type_id() %in% TYPES_WITH_NAN) {
+ # replace NaN with NA, using Arrow's smallest float type to avoid casting
+ # smaller float types to larger float types
Review comment:
FYI, when all values in the row are `NA` or `NaN`, `dplyr::coalesce()`
returns the _last_ one, which is why this solution works:
```
> dplyr::coalesce(NA, NaN)
[1] NaN
> dplyr::coalesce(NaN, NA)
[1] NA
```
##########
File path: r/R/dplyr-functions.R
##########
@@ -57,6 +57,30 @@ nse_funcs$cast <- function(x, target_type, safe = TRUE, ...)
{
Expression$create("cast", x, options = opts)
}
+nse_funcs$coalesce <- function(...) {
+ if (missing(..1)) {
+ abort("At least one argument must be supplied to coalesce()")
+ }
+ # TODO: if an option is added to the coalesce kernel to treat NaN as NA,
+ # use that to simplify the code here (ARROW-13389)
+ args <- lapply(list2(...), function(arg) {
+ if (!inherits(arg, "Expression")) {
+ arg <- Expression$scalar(arg)
+ }
+ if (arg$type_id() %in% TYPES_WITH_NAN) {
+ # replace NaN with NA, using Arrow's smallest float type to avoid casting
+ # smaller float types to larger float types
Review comment:
FYI, when all values in the row are `NA` or `NaN`, `dplyr::coalesce()`
returns the _last_ one, which is why this solution works:
```
> dplyr::coalesce(NA_real_, NaN)
[1] NaN
> dplyr::coalesce(NaN, NA_real_)
[1] NA
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]