Copilot commented on code in PR #924:
URL:
https://github.com/apache/skywalking-banyandb/pull/924#discussion_r2665076355
##########
pkg/query/logical/measure/measure_plan_distributed.go:
##########
@@ -514,3 +534,52 @@ func (s *pushedDownAggregatedIterator) Current()
[]*measurev1.DataPoint {
func (s *pushedDownAggregatedIterator) Close() error {
return nil
}
+
+// deduplicateAggregatedDataPoints removes duplicate aggregated results from
multiple replicas
+// by keeping only one data point per group. Since replicas hold identical
data, aggregates
+// for the same group are identical across replicas.
+func deduplicateAggregatedDataPoints(dataPoints []*measurev1.DataPoint,
groupByTagsRefs [][]*logical.TagRef) ([]*measurev1.DataPoint, error) {
+ if len(groupByTagsRefs) == 0 {
+ return dataPoints, nil
+ }
+ groupMap := make(map[uint64]*measurev1.DataPoint)
+ result := make([]*measurev1.DataPoint, 0)
Review Comment:
The `deduplicateAggregatedDataPoints` function should pre-allocate the
result slice with an appropriate capacity to avoid multiple allocations during
append operations. Since we know the maximum possible size is
`len(dataPoints)`, the slice should be initialized with that capacity.
```suggestion
result := make([]*measurev1.DataPoint, 0, len(dataPoints))
```
##########
pkg/query/logical/measure/measure_analyzer.go:
##########
@@ -160,7 +160,8 @@ func DistributedAnalyze(criteria *measurev1.QueryRequest,
ss []logical.Schema) (
// TODO: to support all aggregation functions
needCompletePushDownAgg := criteria.GetAgg() != nil &&
(criteria.GetAgg().GetFunction() ==
modelv1.AggregationFunction_AGGREGATION_FUNCTION_MAX ||
- criteria.GetAgg().GetFunction() ==
modelv1.AggregationFunction_AGGREGATION_FUNCTION_MIN) &&
+ criteria.GetAgg().GetFunction() ==
modelv1.AggregationFunction_AGGREGATION_FUNCTION_MIN ||
+ criteria.GetAgg().GetFunction() ==
modelv1.AggregationFunction_AGGREGATION_FUNCTION_SUM) &&
criteria.GetTop() == nil
Review Comment:
This condition has two critical issues:
1. **Operator precedence bug**: The condition is currently evaluated as
`(MAX || MIN || SUM && TOP == nil)` due to && having higher precedence than ||.
This means the `criteria.GetTop() == nil` check only applies to SUM, not to MAX
and MIN. Missing parentheses around the OR conditions causes this issue.
2. **Missing GROUP BY check**: The condition should explicitly require
`criteria.GetGroupBy() != nil`. Without GROUP BY, pushing down the aggregation
completely doesn't make sense in a replicated environment, as each replica
would compute a partial aggregate that would need further aggregation. The
deduplication logic also assumes GROUP BY exists (it does nothing when
`groupByTagsRefs` is empty).
The condition should be:
`needCompletePushDownAgg := criteria.GetAgg() != nil &&
criteria.GetGroupBy() != nil && (criteria.GetAgg().GetFunction() ==
AGGREGATION_FUNCTION_MAX || criteria.GetAgg().GetFunction() ==
AGGREGATION_FUNCTION_MIN || criteria.GetAgg().GetFunction() ==
AGGREGATION_FUNCTION_SUM) && criteria.GetTop() == nil`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]