thisisnic commented on issue #45373:
URL: https://github.com/apache/arrow/issues/45373#issuecomment-3689322433
My initial thoughts are that it could be somewhere in the bits of query
optimisation which are done in R before passing things to the C++ layer - CC
@nealrichardson
I checked it wasn't something weird in the `across()` binding:
```r
arrow_table(mtcars) |>
arrange(mpg) |>
summarize(min_mpg = min(mpg)) |>
collect()
#> Error in `compute.arrow_dplyr_query()`:
#> ! Invalid: Invalid sort key column: No match for FieldRef.Name(mpg) in
min_mpg: double
#> ----
#> min_mpg:
#> [
#> [
#> 10.4
#> ]
#> ]
#> Run `rlang::last_trace()` to see where the error occurred.
```
As a temporary workaround, you can, for some reason, call `slice_head(n =
nrow(data)` - it appears to be forcing some ordering of operations, e.g.
``` r
library(arrow)
library(dplyr)
nrows <- nrow(mtcars)
arrow_table(mtcars) |>
arrange(mpg) |>
slice_head(n=nrows) |>
summarize(across(mpg, list(Min = min, Max = max))) |>
collect()
#> # A tibble: 1 × 2
#> mpg_Min mpg_Max
#> <dbl> <dbl>
#> 1 10.4 33.9
```
Don't have time to delve deeply into it right now, but will point an LLM at
it and see what it comes up with.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]