nealrichardson commented on a change in pull request #10215:
URL: https://github.com/apache/arrow/pull/10215#discussion_r623993426
##########
File path: r/R/compute.R
##########
@@ -267,6 +267,25 @@ value_counts <- function(x) {
call_function("value_counts", x)
}
+
+#' `variance` and `stddev` for Arrow objects
Review comment:
I'm not sure about this. Unfortunately, `sd()` and `var()` aren't
generics so we can't just define methods for them. So it might not be worth
adding these wrappers at all.
##########
File path: r/src/compute.cpp
##########
@@ -232,7 +232,12 @@ std::shared_ptr<arrow::compute::FunctionOptions>
make_compute_options(
cpp11::as_cpp<std::string>(options["replacement"]),
max_replacements);
}
-
+
+ if (func_name == "variance" || func_name == "stddev") {
Review comment:
TBH this is probably the only code addition we want to keep here.
##########
File path: r/R/compute.R
##########
@@ -267,6 +267,25 @@ value_counts <- function(x) {
call_function("value_counts", x)
}
+
+#' `variance` and `stddev` for Arrow objects
+#'
+#' These functions calculate the variance and standard deviation of Arrow
arrays
+#' @param x `Array` or `ChunkedArray`
+#' @param ddof The divisor used in calculations is N - ddof, where N is the
number of elements.
+#' By default, ddof is zero, and population variance or stddev is returned.
+#' @return A `Scalar` containing the calculated value.
+#' @export
+stddev <- function(x, ddof = 0) {
+ call_function("stddev", x, options = list(ddof = ddof))
Review comment:
Is there no `na.rm` handling in the Arrow stddev and variance functions?
If not, there should be (please JIRA).
##########
File path: r/R/dplyr.R
##########
@@ -480,6 +480,18 @@ build_function_list <- function(FUN) {
between = function(x, left, right) {
x >= left & x <= right
},
+ sd = function(x, na.rm = FALSE){
+ if (!na.rm && x$null_count > 0) {
+ return(Scalar$create(NA_real_))
+ }
Review comment:
We don't support aggregations in our dplyr backend yet, so this should
never succeed. If `sd()` doesn't cleanly and always error when called on an
arrow Expression, we should force it to--see the "fail" handling inside of
`arrow_eval` where this is done for `mean`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]