AVG) in the multi-stage query engine [pinot]

via GitHub Fri, 20 Dec 2024 01:36:51 -0800


yashmayya opened a new pull request, #14689:
URL: https://github.com/apache/pinot/pull/14689


   - Fixes https://github.com/apache/pinot/issues/12705.
   - As discussed in the above issue, the problem is that in final aggregates 
the actual return type from the aggregation function will be a double value 
(from `MIN` / `MAX` / `SUM` / `AVG`) and this is attempted to be cast to 
`BigDecimal` when constructing the data block to be sent over the wire. The 
cast is attempted because the return type for `MIN` / `MAX` / `SUM` / `AVG` is 
inferred as `DECIMAL` when the input operand type is `DECIMAL` (or 
`BIG_DECIMAL` in Pinot) due to the use of the standard operators in Calcite. We 
don't want to change this type inference logic because it's backward 
incompatible (the actual return type for users would change if we change this) 
and also because we want to move towards actual polymorphic aggregation 
functions (i.e., not just the appearance of polymorphism through casting). Note 
that this isn't an issue in the leaf aggregation because the return type there 
is determined as `DOUBLE` through the 
[AggregationFunctionType](https://github.com/apache/
 
pinot/blob/d15f8c8caf531b29a839801a8afc08dc314c6d8e/pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/AggregationFunctionType.java#L54-L59)
 (see 
[here](https://github.com/apache/pinot/blob/d15f8c8caf531b29a839801a8afc08dc314c6d8e/pinot-query-planner/src/main/java/org/apache/pinot/calcite/rel/rules/PinotAggregateExchangeNodeInsertRule.java#L326-L331)).
   - The type conversion logic added to `TypeUtils::convert` (that handles 
single-stage -> multi-stage type conversion) includes a type check before 
converting because there are some aggregation functions like `SUMPRECISION` 
that actually return `BigDecimal` values and we don't want to lose precision 
there by converting through doubles.
   - In the long term, we want to support truly polymorphic aggregation 
functions in the multi-stage query engine (i.e., where the MAX aggregation 
function would return a value of the same type as the input operand type - 
right now it's always `DOUBLE`). However, this has backward compatibility 
implications and we need to be careful about not breaking the v1 engine. Right 
now, the output type to end users will still be the same as the input operand 
type in the MSQE but there can be loss of precision (`MAX(BIG_DECIMAL col) -> 
DOUBLE output -> cast to BIG_DECIMAL (loss of precision)` for instance).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] Fix broken BIG_DECIMAL aggregations (MIN / MAX / SUM / AVG) in the multi-stage query engine [pinot]

Reply via email to