Re: [I] Partitioned Aggregation Support [pinot]

via GitHub Wed, 06 Dec 2023 03:35:26 -0800


dario-liberman commented on issue #12057:
URL: https://github.com/apache/pinot/issues/12057#issuecomment-1842696322


   @Jackie-Jiang - I think we can use some of the building blocks of 
`serverReturnFinalResult`, but it still has two gaps as far as I can see.
   
   1. We still need a way to aggregate across servers, my point is that the 
merge functions within partition and across partitions are different. For 
example when calculating distinct counts, the former merge would be a union of 
sets whereas the latter merge would be a sum of numbers.
   2. If we only worry about the broker, we still miss on optimisation that can 
be done at the server level by having more efficient cross-partition 
aggregation (or more importantly lower memory cost within partition, such as 
smaller sets for distinct count). There could be an order of magnitude (or 
more) difference between partition count and server count (eg. 512 partitions 
split across 16 servers), specially with replication there could be a 
deployment configuration where the cluster is large but the partitions could be 
co-located on a small number of servers sufficient to return the result for a 
given query.
   
   In order to address both points above, I don't see how we can do it without 
explicit partitioned support in `AggregationFunction` (or as I say above, we 
can have a `PartitionedAggregationFunction` sub-class and we can check in 
run-time if the object implements the sub-class).
   
   @walterddr - I was not aware of the support for partitioned multi-stage 
query optimisations, thanks for sharing. I agree this work could indeed also 
benefit partitioned optimizations in multi-stage engine. However, as I say 
above, it is a different problem to optimize aggregations where the group-by 
column is partitioned vs optimising aggregations where the partitioning allows 
to optimise the aggregation itself (eg distinct count over a partitioned 
column). I argue that the aggregation function needs to be written in such a 
way to allow for the optimisation.
   
   There is a way to make this more generic I guess, we could have perhaps a 
pseudo function representing the partition id (could be just a number from 1 to 
N), then ask to write the query having two clearly separate phases. For example:
   
   ```
   select SUM(partitioned_agg) from (
       select DISTINCT_COUNT(partition_column) as partitioned_agg from table 
       group by PARTITION_ID(partition_column)
   )
   ```
   Then it is the user that needs to know what function to use to aggregate 
across partitions.
   I did not propose this, because in my view the engine should figure out how 
to optimise partitioning, not the user. But it is an option I guess. Maybe even 
ask to introduce appropriate hints for both selects above.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Partitioned Aggregation Support [pinot]

Reply via email to