[
https://issues.apache.org/jira/browse/ARROW-17361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17579737#comment-17579737
]
Dragoș Moldovan-Grünfeld commented on ARROW-17361:
--------------------------------------------------
Thanks for reporting this. I can confirm it is an unintended consequence of how
we do the evaluation inside {{summarise}}. We should definitely get this to
work. In the meantime, I can recommend a work-around using {{rlang}}'s
injection operator ({{!!}}):
{code:r}
open_dataset(tmpdir) %>%
summarize(value = sum(y) / !!scale_factor) %>%
collect()
#> # A tibble: 1 × 1
#> value
#> <dbl>
#> 1 3
{code}
> [R] dplyr::summarize fails with division when divisor is a variable
> -------------------------------------------------------------------
>
> Key: ARROW-17361
> URL: https://issues.apache.org/jira/browse/ARROW-17361
> Project: Apache Arrow
> Issue Type: Bug
> Components: R
> Affects Versions: 8.0.0
> Reporter: Oliver Reiter
> Assignee: Dragoș Moldovan-Grünfeld
> Priority: Minor
> Labels: aggregation, dplyr
>
> Hello,
> I found this odd behaviour when trying to compute an aggregate with
> dplyr::summarize: When I want to use a pre-defined variable to do a divison
> while aggregating, the execution fails with 'unsupported expression'. When I
> the value of the variable as is in the aggregation, it works.
>
> See below:
>
> {code:java}
> library(dplyr)
> library(arrow)
> small_dataset <- tibble::tibble(
> ## x = rep(c("a", "b"), each = 5),
> y = rep(1:5, 2)
> )
> ## convert "small_dataset" into a ...dataset
> tmpdir <- tempfile()
> dir.create(tmpdir)
> write_dataset(small_dataset, tmpdir)
> ## works
> open_dataset(tmpdir) %>%
> summarize(value = sum(y) / 10) %>%
> collect()
> ## fails
> scale_factor <- 10
> open_dataset(tmpdir) %>%
> summarize(value = sum(y) / scale_factor) %>%
> collect()
> #> Fehler: Error in summarize_eval(names(exprs)[i],
> #> exprs[[i]], ctx, length(.data$group_by_vars) > :
> # Expression sum(y)/scale_factor is not an aggregate
> # expression or is not supported in Arrow
> # Call collect() first to pull data into R.
> {code}
> I was not sure how to name this issue/bug (if it is one), so if there is a
> clearer, more descriptive title you're welcome to adjust.
>
> Thanks for your work!
>
> Oliver
>
> {code:java}
> > arrow_info()
> Arrow package version: 8.0.0
> Capabilities:
>
> dataset TRUE
> substrait FALSE
> parquet TRUE
> json TRUE
> s3 TRUE
> utf8proc TRUE
> re2 TRUE
> snappy TRUE
> gzip TRUE
> brotli TRUE
> zstd TRUE
> lz4 TRUE
> lz4_frame TRUE
> lzo FALSE
> bz2 TRUE
> jemalloc TRUE
> mimalloc TRUE
> Memory:
>
> Allocator jemalloc
> Current 64 bytes
> Max 41.25 Kb
> Runtime:
>
> SIMD Level avx2
> Detected SIMD Level avx2
> Build:
>
> C++ Library Version 8.0.0
> C++ Compiler GNU
> C++ Compiler Version 12.1.0 {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)