[GitHub] [arrow] ianmcook commented on a change in pull request #10751: ARROW-13201: [R] Add binding for coalesce()

GitBox Tue, 20 Jul 2021 05:08:05 -0700


ianmcook commented on a change in pull request #10751:
URL: https://github.com/apache/arrow/pull/10751#discussion_r672619775




##########
File path: r/R/dplyr-functions.R
##########
@@ -57,6 +57,30 @@ nse_funcs$cast <- function(x, target_type, safe = TRUE, ...) 
{
   Expression$create("cast", x, options = opts)
 }
 
+nse_funcs$coalesce <- function(...) {
+  if (missing(..1)) {
+    abort("At least one argument must be supplied to coalesce()")
+  }
+  # TODO: if an option is added to the coalesce kernel to treat NaN as NA,
+  # use that to simplify the code here (ARROW-13389)
+  args <- lapply(list2(...), function(arg) {
+    if (!inherits(arg, "Expression")) {
+      arg <- Expression$scalar(arg)
+    }
+    if (arg$type_id() %in% TYPES_WITH_NAN) {
+      # replace NaN with NA, using Arrow's smallest float type to avoid casting
+      # smaller float types to larger float types

Review comment:
       This approach is flawed because it replaces `NaN` with `NA` in the 
result when all the values are `NaN`

##########
File path: r/R/dplyr-functions.R
##########
@@ -57,6 +57,30 @@ nse_funcs$cast <- function(x, target_type, safe = TRUE, ...) 
{
   Expression$create("cast", x, options = opts)
 }
 
+nse_funcs$coalesce <- function(...) {
+  if (missing(..1)) {
+    abort("At least one argument must be supplied to coalesce()")
+  }
+  # TODO: if an option is added to the coalesce kernel to treat NaN as NA,
+  # use that to simplify the code here (ARROW-13389)
+  args <- lapply(list2(...), function(arg) {
+    if (!inherits(arg, "Expression")) {
+      arg <- Expression$scalar(arg)
+    }
+    if (arg$type_id() %in% TYPES_WITH_NAN) {
+      # replace NaN with NA, using Arrow's smallest float type to avoid casting
+      # smaller float types to larger float types

Review comment:
       Oh, I think the solution is to skip the last iteration of the `lapply`

##########
File path: r/R/dplyr-functions.R
##########
@@ -57,6 +57,30 @@ nse_funcs$cast <- function(x, target_type, safe = TRUE, ...) 
{
   Expression$create("cast", x, options = opts)
 }
 
+nse_funcs$coalesce <- function(...) {
+  if (missing(..1)) {
+    abort("At least one argument must be supplied to coalesce()")
+  }
+  # TODO: if an option is added to the coalesce kernel to treat NaN as NA,
+  # use that to simplify the code here (ARROW-13389)
+  args <- lapply(list2(...), function(arg) {
+    if (!inherits(arg, "Expression")) {
+      arg <- Expression$scalar(arg)
+    }
+    if (arg$type_id() %in% TYPES_WITH_NAN) {
+      # replace NaN with NA, using Arrow's smallest float type to avoid casting
+      # smaller float types to larger float types

Review comment:
       Fixed in 3a91ee9

##########
File path: r/R/dplyr-functions.R
##########
@@ -57,6 +57,30 @@ nse_funcs$cast <- function(x, target_type, safe = TRUE, ...) 
{
   Expression$create("cast", x, options = opts)
 }
 
+nse_funcs$coalesce <- function(...) {
+  if (missing(..1)) {
+    abort("At least one argument must be supplied to coalesce()")
+  }
+  # TODO: if an option is added to the coalesce kernel to treat NaN as NA,
+  # use that to simplify the code here (ARROW-13389)
+  args <- lapply(list2(...), function(arg) {
+    if (!inherits(arg, "Expression")) {
+      arg <- Expression$scalar(arg)
+    }
+    if (arg$type_id() %in% TYPES_WITH_NAN) {
+      # replace NaN with NA, using Arrow's smallest float type to avoid casting
+      # smaller float types to larger float types

Review comment:
       Fixed in 3a91ee9. I used an attribute named `last` to keep track of 
which iteration is the final one.

##########
File path: r/R/dplyr-functions.R
##########
@@ -57,6 +57,30 @@ nse_funcs$cast <- function(x, target_type, safe = TRUE, ...) 
{
   Expression$create("cast", x, options = opts)
 }
 
+nse_funcs$coalesce <- function(...) {
+  if (missing(..1)) {
+    abort("At least one argument must be supplied to coalesce()")
+  }
+  # TODO: if an option is added to the coalesce kernel to treat NaN as NA,
+  # use that to simplify the code here (ARROW-13389)
+  args <- lapply(list2(...), function(arg) {
+    if (!inherits(arg, "Expression")) {
+      arg <- Expression$scalar(arg)
+    }
+    if (arg$type_id() %in% TYPES_WITH_NAN) {
+      # replace NaN with NA, using Arrow's smallest float type to avoid casting
+      # smaller float types to larger float types

Review comment:
       Fixed now. I used an attribute named `last` to keep track of which 
iteration is the final one.

##########
File path: r/R/dplyr-functions.R
##########
@@ -57,6 +57,30 @@ nse_funcs$cast <- function(x, target_type, safe = TRUE, ...) 
{
   Expression$create("cast", x, options = opts)
 }
 
+nse_funcs$coalesce <- function(...) {
+  if (missing(..1)) {
+    abort("At least one argument must be supplied to coalesce()")
+  }
+  # TODO: if an option is added to the coalesce kernel to treat NaN as NA,
+  # use that to simplify the code here (ARROW-13389)
+  args <- lapply(list2(...), function(arg) {
+    if (!inherits(arg, "Expression")) {
+      arg <- Expression$scalar(arg)
+    }
+    if (arg$type_id() %in% TYPES_WITH_NAN) {
+      # replace NaN with NA, using Arrow's smallest float type to avoid casting
+      # smaller float types to larger float types

Review comment:
       FYI, when all values in the row are `NA` or `NaN`, `dplyr::coalesce()` 
returns the _last_ one, which is why this solution works:
   ```
   > dplyr::coalesce(NA, NaN)
   [1] NaN
   > dplyr::coalesce(NaN, NA)
   [1] NA
   ```

##########
File path: r/R/dplyr-functions.R
##########
@@ -57,6 +57,30 @@ nse_funcs$cast <- function(x, target_type, safe = TRUE, ...) 
{
   Expression$create("cast", x, options = opts)
 }
 
+nse_funcs$coalesce <- function(...) {
+  if (missing(..1)) {
+    abort("At least one argument must be supplied to coalesce()")
+  }
+  # TODO: if an option is added to the coalesce kernel to treat NaN as NA,
+  # use that to simplify the code here (ARROW-13389)
+  args <- lapply(list2(...), function(arg) {
+    if (!inherits(arg, "Expression")) {
+      arg <- Expression$scalar(arg)
+    }
+    if (arg$type_id() %in% TYPES_WITH_NAN) {
+      # replace NaN with NA, using Arrow's smallest float type to avoid casting
+      # smaller float types to larger float types

Review comment:
       FYI, when all values in the row are `NA` or `NaN`, `dplyr::coalesce()` 
returns the _last_ one, which is why this solution works:
   ```
   > dplyr::coalesce(NA_real_, NaN)
   [1] NaN
   > dplyr::coalesce(NaN, NA_real_)
   [1] NA
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] ianmcook commented on a change in pull request #10751: ARROW-13201: [R] Add binding for coalesce()

Reply via email to