[GitHub] [iceberg] rdblue commented on a diff in pull request #6981: Push down group by for partition columns

via GitHub Sun, 02 Apr 2023 12:44:16 -0700


rdblue commented on code in PR #6981:
URL: https://github.com/apache/iceberg/pull/6981#discussion_r1155365155



##########
api/src/main/java/org/apache/iceberg/expressions/CountAggregate.java:
##########
@@ -47,25 +50,40 @@ protected Long countFor(DataFile file) {
   }
 
   @Override
-  public Aggregator<Long> newAggregator() {
-    return new CountAggregator<>(this);
+  public Aggregator<Long> newAggregator(List<BoundGroupBy<?, ?>> groupBys) {
+    return new CountAggregator<>(this, groupBys);
   }
 
   private static class CountAggregator<T> extends NullSafeAggregator<T, Long> {
     private Long count = 0L;
+    private Map<StructLike, Long> countP = Maps.newHashMap();
 
-    CountAggregator(BoundAggregate<T, Long> aggregate) {
-      super(aggregate);
+    CountAggregator(BoundAggregate<T, Long> aggregate, List<BoundGroupBy<?, 
?>> groupBys) {
+      super(aggregate, groupBys);
     }
 
     @Override
     protected void update(Long value) {
       count += value;
     }
 
+    @Override
+    protected void update(Long value, StructLike partitionKey) {
+      if (countP.get(partitionKey) == null) {
+        countP.put(partitionKey, value);
+      } else {
+        countP.put(partitionKey, countP.get(partitionKey) + value);
+      }
+    }
+
     @Override
     protected Long current() {
       return count;
     }
+
+    @Override
+    protected Map<StructLike, Long> currentPartition() {

Review Comment:
   Why change the aggregator to be partition aware? It seems like it would be 
cleaner to leave this class as it was and wrap it. Maybe introduce a 
`GroupingAggregator` that has a map of aggregators by partition key. That way 
the aggregators stay roughly the same and the grouping functionality is layered 
on top.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] rdblue commented on a diff in pull request #6981: Push down group by for partition columns

Reply via email to