[
https://issues.apache.org/jira/browse/ARROW-17361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17633378#comment-17633378
]
Apache Arrow JIRA Bot commented on ARROW-17361:
-----------------------------------------------
This issue was last updated over 90 days ago, which may be an indication it is
no longer being actively worked. To better reflect the current state, the issue
is being unassigned per [project
policy|https://arrow.apache.org/docs/dev/developers/bug_reports.html#issue-assignment].
Please feel free to re-take assignment of the issue if it is being actively
worked, or if you plan to start that work soon.
> [R] dplyr::summarize fails with division when divisor is a variable
> -------------------------------------------------------------------
>
> Key: ARROW-17361
> URL: https://issues.apache.org/jira/browse/ARROW-17361
> Project: Apache Arrow
> Issue Type: Bug
> Components: R
> Affects Versions: 8.0.0
> Reporter: Oliver Reiter
> Assignee: Dragoș Moldovan-Grünfeld
> Priority: Minor
> Labels: aggregation, dplyr
>
> Hello,
> I found this odd behaviour when trying to compute an aggregate with
> dplyr::summarize: When I want to use a pre-defined variable to do a divison
> while aggregating, the execution fails with 'unsupported expression'. When I
> the value of the variable as is in the aggregation, it works.
>
> See below:
>
> {code:java}
> library(dplyr)
> library(arrow)
> small_dataset <- tibble::tibble(
> ## x = rep(c("a", "b"), each = 5),
> y = rep(1:5, 2)
> )
> ## convert "small_dataset" into a ...dataset
> tmpdir <- tempfile()
> dir.create(tmpdir)
> write_dataset(small_dataset, tmpdir)
> ## works
> open_dataset(tmpdir) %>%
> summarize(value = sum(y) / 10) %>%
> collect()
> ## fails
> scale_factor <- 10
> open_dataset(tmpdir) %>%
> summarize(value = sum(y) / scale_factor) %>%
> collect()
> #> Fehler: Error in summarize_eval(names(exprs)[i],
> #> exprs[[i]], ctx, length(.data$group_by_vars) > :
> # Expression sum(y)/scale_factor is not an aggregate
> # expression or is not supported in Arrow
> # Call collect() first to pull data into R.
> {code}
> I was not sure how to name this issue/bug (if it is one), so if there is a
> clearer, more descriptive title you're welcome to adjust.
>
> Thanks for your work!
>
> Oliver
>
> {code:java}
> > arrow_info()
> Arrow package version: 8.0.0
> Capabilities:
>
> dataset TRUE
> substrait FALSE
> parquet TRUE
> json TRUE
> s3 TRUE
> utf8proc TRUE
> re2 TRUE
> snappy TRUE
> gzip TRUE
> brotli TRUE
> zstd TRUE
> lz4 TRUE
> lz4_frame TRUE
> lzo FALSE
> bz2 TRUE
> jemalloc TRUE
> mimalloc TRUE
> Memory:
>
> Allocator jemalloc
> Current 64 bytes
> Max 41.25 Kb
> Runtime:
>
> SIMD Level avx2
> Detected SIMD Level avx2
> Build:
>
> C++ Library Version 8.0.0
> C++ Compiler GNU
> C++ Compiler Version 12.1.0 {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)