nealrichardson commented on code in PR #44652: URL: https://github.com/apache/arrow/pull/44652#discussion_r1843769552
########## r/R/dplyr-distinct.R: ########## @@ -33,11 +27,28 @@ distinct.arrow_dplyr_query <- function(.data, ..., .keep_all = FALSE) { .data <- dplyr::group_by(.data, !!!syms(names(.data))) } - out <- dplyr::summarize(.data, .groups = "drop") + if (isTRUE(.keep_all)) { + # Note: in regular dplyr, `.keep_all = TRUE` returns the first row's value. + # However, Acero's `hash_one` function prefers returning non-null values. + # So, you'll get the same shape of data, but the values may differ. Review Comment: It is documented on the acero man page, that's the change to arrow-package.R. I'd rather not one-time warning; that's a slippery slope if we were going to be chatty about every subtle difference between how Acero works from dplyr on data.frames. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org