r2evans commented on issue #45373:
URL: https://github.com/apache/arrow/issues/45373#issuecomment-3690224763

   For the record: In the original example, we do lose the sort-column in the 
summarization, but if we `mutate` instead of `summarize`, we should still have 
the sort column so the sort is likely still appropriate (to the user, even if 
it may not affect the calculations).
   
   ```r
   arrow_table(mtcars) |>
     arrange(mpg) |>
     mutate(mpg_head = mean(mpg)) |>
     collect()
   # Error in `compute.arrow_dplyr_query()`:
   # ! Invalid: Invalid sort key column: No match for FieldRef.Name(mpg) in 
..temp0: double
   # ----
   # ..temp0:
   #   [
   #     [
   #       20.090625
   #     ]
   #   ]
   # Run `rlang::last_trace()` to see where the error occurred.
   ```
   
   Moving the `arrange()` to _after_ `mutate()` works here, and it _may_ work 
with the `summarize()` (if the code were slightly different ... unimportant 
atm).
   
   Are you suggesting that it is best to silently drop the sort? If we cannot 
preserve `$arrange_*` during the collapse stage, some thoughts:
   
   1. Drop the sort column(s) with a `warning()`. I think this is important, 
since the user needs to know that the assumption of ordered data is no longer 
true.
   2. Is there a way to detect that this problem will occur and rearrange the 
steps such that `arrange()` occurs after the collapse? This presents the 
possibility that the column may not be available:
       - If it is not visible post-collapse (e.g., `summarize()` that drops the 
sort columns), then warn-and-drop;
       - If it is visible post-collapse, then rearrange the calc-steps so that 
the sorting is after the collapse-step(s).
   
   BTW: I had used the `slice_head()` trick, but I ran into problems and had to 
remove it, whether for performance reasons or some other fragility. I can't 
find "why" I removed it from my code, so this is left dangling atm ...
   
   Bottom line: while I understand that removing the sort in this case might be 
the hack to stop the error, I think it needs to be done with a warning instead 
of quietly. Thoughts?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to