dario-liberman commented on issue #12057:
URL: https://github.com/apache/pinot/issues/12057#issuecomment-1842696322
@Jackie-Jiang - I think we can use some of the building blocks of
`serverReturnFinalResult`, but it still has two gaps as far as I can see.
1. We still need a way to aggregate across servers, my point is that the
merge functions within partition and across partitions are different. For
example when calculating distinct counts, the former merge would be a union of
sets whereas the latter merge would be a sum of numbers.
2. If we only worry about the broker, we still miss on optimisation that can
be done at the server level by having more efficient cross-partition
aggregation (or more importantly lower memory cost within partition, such as
smaller sets for distinct count). There could be an order of magnitude (or
more) difference between partition count and server count (eg. 512 partitions
split across 16 servers), specially with replication there could be a
deployment configuration where the cluster is large but the partitions could be
co-located on a small number of servers sufficient to return the result for a
given query.
In order to address both points above, I don't see how we can do it without
explicit partitioned support in `AggregationFunction` (or as I say above, we
can have a `PartitionedAggregationFunction` sub-class and we can check in
run-time if the object implements the sub-class).
@walterddr - I was not aware of the support for partitioned multi-stage
query optimisations, thanks for sharing. I agree this work could indeed also
benefit partitioned optimizations in multi-stage engine. However, as I say
above, it is a different problem to optimize aggregations where the group-by
column is partitioned vs optimising aggregations where the partitioning allows
to optimise the aggregation itself (eg distinct count over a partitioned
column). I argue that the aggregation function needs to be written in such a
way to allow for the optimisation.
There is a way to make this more generic I guess, we could have perhaps a
pseudo function representing the partition id (could be just a number from 1 to
N), then ask to write the query having two clearly separate phases. For example:
```
select SUM(partitioned_agg) from (
select DISTINCT_COUNT(partition_column) as partitioned_agg from table
group by PARTITION_ID(partition_column)
)
```
Then it is the user that needs to know what function to use to aggregate
across partitions.
I did not propose this, because in my view the engine should figure out how
to optimise partitioning, not the user. But it is an option I guess. Maybe even
ask to introduce appropriate hints for both selects above.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]