rdblue commented on code in PR #6981:
URL: https://github.com/apache/iceberg/pull/6981#discussion_r1155365155
##########
api/src/main/java/org/apache/iceberg/expressions/CountAggregate.java:
##########
@@ -47,25 +50,40 @@ protected Long countFor(DataFile file) {
}
@Override
- public Aggregator<Long> newAggregator() {
- return new CountAggregator<>(this);
+ public Aggregator<Long> newAggregator(List<BoundGroupBy<?, ?>> groupBys) {
+ return new CountAggregator<>(this, groupBys);
}
private static class CountAggregator<T> extends NullSafeAggregator<T, Long> {
private Long count = 0L;
+ private Map<StructLike, Long> countP = Maps.newHashMap();
- CountAggregator(BoundAggregate<T, Long> aggregate) {
- super(aggregate);
+ CountAggregator(BoundAggregate<T, Long> aggregate, List<BoundGroupBy<?,
?>> groupBys) {
+ super(aggregate, groupBys);
}
@Override
protected void update(Long value) {
count += value;
}
+ @Override
+ protected void update(Long value, StructLike partitionKey) {
+ if (countP.get(partitionKey) == null) {
+ countP.put(partitionKey, value);
+ } else {
+ countP.put(partitionKey, countP.get(partitionKey) + value);
+ }
+ }
+
@Override
protected Long current() {
return count;
}
+
+ @Override
+ protected Map<StructLike, Long> currentPartition() {
Review Comment:
Why change the aggregator to be partition aware? It seems like it would be
cleaner to leave this class as it was and wrap it. Maybe introduce a
`GroupingAggregator` that has a map of aggregators by partition key. That way
the aggregators stay roughly the same and the grouping functionality is layered
on top.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]