thinkharderdev commented on code in PR #12571:
URL: https://github.com/apache/datafusion/pull/12571#discussion_r1773982005


##########
datafusion/sqllogictest/test_files/aggregate.slt:
##########
@@ -3512,6 +3512,18 @@ SELECT MIN(value), MAX(value) FROM integers_with_nulls
 ----
 1 5
 
+# grouping_sets with null values
+query II rowsort
+SELECT value, min(value) FROM integers_with_nulls GROUP BY CUBE(value)
+----
+1 1
+3 3
+4 4
+5 5
+NULL 1
+NULL NULL

Review Comment:
   IIUC then without the fix in this PR the final `NULL NULL` row would be 
omitted?



##########
datafusion/physical-plan/src/aggregates/mod.rs:
##########
@@ -108,6 +110,8 @@ impl AggregateMode {
     }
 }
 
+const INTERNAL_GROUPING_ID: &str = "grouping_id";

Review Comment:
   What happens if this conflicts with a user-defined field? E.g. if I had a 
query like:
   ```
   SELECT grouping_id, count(1) FROM table GROUP BY CUBE(grouping_id)
   ```



##########
datafusion/physical-plan/src/aggregates/mod.rs:
##########
@@ -137,6 +141,10 @@ pub struct PhysicalGroupBy {
     /// expression in null_expr. If `groups[i][j]` is true, then the
     /// j-th expression in the i-th group is NULL, otherwise it is `expr[j]`.
     groups: Vec<Vec<bool>>,
+    // The number of internal expressions that are used to implement grouping
+    // sets. These output are removed from the final output and not in `expr`
+    // as they are generated based on the value in `groups`
+    num_internal_exprs: usize,

Review Comment:
   Is there a scenario in which this is something other than `0` or `1`?



##########
datafusion/physical-plan/src/aggregates/mod.rs:
##########
@@ -210,13 +221,114 @@ impl PhysicalGroupBy {
             .collect()
     }
 
+    /// The number of expressions in the output schema.
+    fn num_output_exprs(&self, mode: &AggregateMode) -> usize {
+        let mut num_exprs = self.expr.len();
+        if !self.is_single() {
+            num_exprs += self.num_internal_exprs;
+        }
+        if *mode != AggregateMode::Partial {
+            num_exprs -= self.num_internal_exprs;
+        }
+        num_exprs
+    }
+
     /// Return grouping expressions as they occur in the output schema.
-    pub fn output_exprs(&self) -> Vec<Arc<dyn PhysicalExpr>> {
-        self.expr
-            .iter()
-            .enumerate()
-            .map(|(index, (_, name))| Arc::new(Column::new(name, index)) as _)
-            .collect()
+    pub fn output_exprs(&self, mode: &AggregateMode) -> Vec<Arc<dyn 
PhysicalExpr>> {
+        let num_output_exprs = self.num_output_exprs(mode);
+        let mut output_exprs = Vec::with_capacity(num_output_exprs);
+        output_exprs.extend(
+            self.expr
+                .iter()
+                .enumerate()
+                .take(num_output_exprs)
+                .map(|(index, (_, name))| Arc::new(Column::new(name, index)) 
as _),
+        );
+        if !self.is_single() && *mode == AggregateMode::Partial {
+            output_exprs
+                .push(Arc::new(Column::new(INTERNAL_GROUPING_ID, 
self.expr.len())) as _);
+        }
+        output_exprs
+    }
+
+    /// Returns the number expression as grouping keys.
+    fn num_group_exprs(&self) -> usize {
+        if self.is_single() {
+            self.expr.len()
+        } else {
+            self.expr.len() + self.num_internal_exprs
+        }
+    }
+
+    /// Returns the data type of the grouping id.

Review Comment:
   Maybe a small comment on what the value we use as the grouping ID. Took me a 
moment to understand the logic below. 
   
   ```suggestion
       /// Returns the data type of the grouping id.
       /// The grouping ID value is a bitmask where each set bit
       /// indicates that the corresponding grouping expression is 
       /// null 
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to