[ 
https://issues.apache.org/jira/browse/ARROW-17361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17633378#comment-17633378
 ] 

Apache Arrow JIRA Bot commented on ARROW-17361:
-----------------------------------------------

This issue was last updated over 90 days ago, which may be an indication it is 
no longer being actively worked. To better reflect the current state, the issue 
is being unassigned per [project 
policy|https://arrow.apache.org/docs/dev/developers/bug_reports.html#issue-assignment].
 Please feel free to re-take assignment of the issue if it is being actively 
worked, or if you plan to start that work soon.

> [R] dplyr::summarize fails with division when divisor is a variable
> -------------------------------------------------------------------
>
>                 Key: ARROW-17361
>                 URL: https://issues.apache.org/jira/browse/ARROW-17361
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: R
>    Affects Versions: 8.0.0
>            Reporter: Oliver Reiter
>            Assignee: Dragoș Moldovan-Grünfeld
>            Priority: Minor
>              Labels: aggregation, dplyr
>
> Hello,
> I found this odd behaviour when trying to compute an aggregate with 
> dplyr::summarize: When I want to use a pre-defined variable to do a divison 
> while aggregating, the execution fails with 'unsupported expression'. When I 
> the value of the variable as is in the aggregation, it works.
>  
> See below:
>  
> {code:java}
> library(dplyr)
> library(arrow)
> small_dataset <- tibble::tibble(
>   ## x = rep(c("a", "b"), each = 5),
>   y = rep(1:5, 2)
> )
> ## convert "small_dataset" into a ...dataset
> tmpdir <- tempfile()
> dir.create(tmpdir)
> write_dataset(small_dataset, tmpdir)
> ## works
> open_dataset(tmpdir) %>%
>   summarize(value = sum(y) / 10) %>%
>   collect()
> ## fails
> scale_factor <- 10
> open_dataset(tmpdir) %>%
>   summarize(value = sum(y) / scale_factor) %>%
>   collect()
> #> Fehler: Error in summarize_eval(names(exprs)[i],
> #> exprs[[i]], ctx, length(.data$group_by_vars) > :
> #   Expression sum(y)/scale_factor is not an aggregate
> #   expression or is not supported in Arrow
> # Call collect() first to pull data into R.
>    {code}
> I was not sure how to name this issue/bug (if it is one), so if there is a 
> clearer, more descriptive title you're welcome to adjust.
>  
> Thanks for your work!
>  
> Oliver
>  
> {code:java}
> > arrow_info()
> Arrow package version: 8.0.0
> Capabilities:
>                
> dataset    TRUE
> substrait FALSE
> parquet    TRUE
> json       TRUE
> s3         TRUE
> utf8proc   TRUE
> re2        TRUE
> snappy     TRUE
> gzip       TRUE
> brotli     TRUE
> zstd       TRUE
> lz4        TRUE
> lz4_frame  TRUE
> lzo       FALSE
> bz2        TRUE
> jemalloc   TRUE
> mimalloc   TRUE
> Memory:
>                   
> Allocator jemalloc
> Current   64 bytes
> Max       41.25 Kb
> Runtime:
>                         
> SIMD Level          avx2
> Detected SIMD Level avx2
> Build:
>                            
> C++ Library Version   8.0.0
> C++ Compiler            GNU
> C++ Compiler Version 12.1.0 {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to