[GitHub] [arrow-datafusion] andygrove commented on pull request #5866: Improve avg/sum Aggregator performance

via GitHub Thu, 06 Apr 2023 07:09:27 -0700


andygrove commented on PR #5866:
URL: 
https://github.com/apache/arrow-datafusion/pull/5866#issuecomment-1499125555


   > > I tried testing the changes in this PR and ran into some errors when 
running query 1 using the code in 
https://github.com/sql-benchmarks/sqlbench-runners/tree/main/datafusion
   > > ```
   > > thread 'tokio-runtime-worker' panicked at 'Unexpected accumulator state 
in hash aggregate: Internal("Arithmetic Overflow in AvgAccumulator")', 
/home/andy/.cargo/git/checkouts/arrow-datafusion-bfd9a8de51c58474/4e6eac5/datafusion/core/src/physical_plan/aggregates/row_hash.rs:642:81
   > > thread 'tokio-runtime-worker' panicked at 'Unexpected accumulator state 
in hash aggregate: Internal("Arithmetic Overflow in AvgAccumulator")', 
/home/andy/.cargo/git/checkouts/arrow-datafusion-bfd9a8de51c58474/4e6eac5/datafusion/core/src/physical_plan/aggregates/row_hash.rs:642:81
   > > thread 'tokio-runtime-worker' panicked at 'Unexpected accumulator state 
in hash aggregate: Internal("Arithmetic Overflow in AvgAccumulator")', 
/home/andy/.cargo/git/checkouts/arrow-datafusion-bfd9a8de51c58474/4e6eac5/datafusion/core/src/physical_plan/aggregates/row_hash.rs:642:81
   > > ```
   > > 
   > > 
   > >     
   > >       
   > >     
   > > 
   > >       
   > >     
   > > 
   > >     
   > >   
   > > I don't see these errors when running against the latest in the main 
branch.
   > 
   > I can not reproduce the issue using DataFusion's own benchmark 
data(sf=10), but I'm able to reproduce the issue using Spark generated 
benchmark data. I guess Spark's tpch data schema is different with DataFusion's.
   
   Maybe decimals vs floats? Official TPC-H uses decimals.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] andygrove commented on pull request #5866: Improve avg/sum Aggregator performance

Reply via email to