Copilot commented on code in PR #924:
URL: 
https://github.com/apache/skywalking-banyandb/pull/924#discussion_r2665076355


##########
pkg/query/logical/measure/measure_plan_distributed.go:
##########
@@ -514,3 +534,52 @@ func (s *pushedDownAggregatedIterator) Current() 
[]*measurev1.DataPoint {
 func (s *pushedDownAggregatedIterator) Close() error {
        return nil
 }
+
+// deduplicateAggregatedDataPoints removes duplicate aggregated results from 
multiple replicas
+// by keeping only one data point per group. Since replicas hold identical 
data, aggregates
+// for the same group are identical across replicas.
+func deduplicateAggregatedDataPoints(dataPoints []*measurev1.DataPoint, 
groupByTagsRefs [][]*logical.TagRef) ([]*measurev1.DataPoint, error) {
+       if len(groupByTagsRefs) == 0 {
+               return dataPoints, nil
+       }
+       groupMap := make(map[uint64]*measurev1.DataPoint)
+       result := make([]*measurev1.DataPoint, 0)

Review Comment:
   The `deduplicateAggregatedDataPoints` function should pre-allocate the 
result slice with an appropriate capacity to avoid multiple allocations during 
append operations. Since we know the maximum possible size is 
`len(dataPoints)`, the slice should be initialized with that capacity.
   ```suggestion
        result := make([]*measurev1.DataPoint, 0, len(dataPoints))
   ```



##########
pkg/query/logical/measure/measure_analyzer.go:
##########
@@ -160,7 +160,8 @@ func DistributedAnalyze(criteria *measurev1.QueryRequest, 
ss []logical.Schema) (
        // TODO: to support all aggregation functions
        needCompletePushDownAgg := criteria.GetAgg() != nil &&
                (criteria.GetAgg().GetFunction() == 
modelv1.AggregationFunction_AGGREGATION_FUNCTION_MAX ||
-                       criteria.GetAgg().GetFunction() == 
modelv1.AggregationFunction_AGGREGATION_FUNCTION_MIN) &&
+                       criteria.GetAgg().GetFunction() == 
modelv1.AggregationFunction_AGGREGATION_FUNCTION_MIN ||
+                       criteria.GetAgg().GetFunction() == 
modelv1.AggregationFunction_AGGREGATION_FUNCTION_SUM) &&
                criteria.GetTop() == nil

Review Comment:
   This condition has two critical issues:
   
   1. **Operator precedence bug**: The condition is currently evaluated as 
`(MAX || MIN || SUM && TOP == nil)` due to && having higher precedence than ||. 
This means the `criteria.GetTop() == nil` check only applies to SUM, not to MAX 
and MIN. Missing parentheses around the OR conditions causes this issue.
   
   2. **Missing GROUP BY check**: The condition should explicitly require 
`criteria.GetGroupBy() != nil`. Without GROUP BY, pushing down the aggregation 
completely doesn't make sense in a replicated environment, as each replica 
would compute a partial aggregate that would need further aggregation. The 
deduplication logic also assumes GROUP BY exists (it does nothing when 
`groupByTagsRefs` is empty).
   
   The condition should be:
   `needCompletePushDownAgg := criteria.GetAgg() != nil && 
criteria.GetGroupBy() != nil && (criteria.GetAgg().GetFunction() == 
AGGREGATION_FUNCTION_MAX || criteria.GetAgg().GetFunction() == 
AGGREGATION_FUNCTION_MIN || criteria.GetAgg().GetFunction() == 
AGGREGATION_FUNCTION_SUM) && criteria.GetTop() == nil`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to