nealrichardson commented on code in PR #13934:
URL: https://github.com/apache/arrow/pull/13934#discussion_r973244901


##########
r/R/dplyr-distinct.R:
##########
@@ -35,8 +29,24 @@ distinct.arrow_dplyr_query <- function(.data, ..., .keep_all 
= FALSE) {
     # distinct() with no vars specified means distinct across all cols
     .data <- dplyr::group_by(.data, !!!syms(names(.data)))
   }
-
-  out <- dplyr::summarize(.data, .groups = "drop")
+  if (isTRUE(.keep_all)) {
+    # (TODO) `.keep_all = TRUE` can return first row value, but this 
implementation
+    # do not always return it because `hash_one` skips rows if they contain 
null value.
+    # If group vars do not uniquely determine return values of each cols,
+    # the result will become different from the original.
+    # If NOT, this option may distroy data.
+    warning(".keep_all = TRUE currently not guarantee to take first row value 
in each cols.")

Review Comment:
   Instead of raising a warning, can you rebase this PR, and in 
https://github.com/apache/arrow/blob/master/r/R/arrow-package.R#L53, leave the 
note that ".keep_all = TRUE will keep one non-missing value for each column but 
it may not be the 'first row'"



##########
r/src/compute.cpp:
##########
@@ -188,6 +188,15 @@ std::shared_ptr<arrow::compute::FunctionOptions> 
make_compute_options(
     return out;
   }
 
+  if (func_name == "one" || func_name == "hash_one") {
+    // (TODO) FunctionOption for `hash_one` has not been implemented yet, so

Review Comment:
   Does it need options? Do we have other cases where we are calling functions 
without options (surely we do), and how are those handled?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to