Re: [DISCUSS] Support GROUP BY ALL (CALCITE-7594)

Julian Hyde Tue, 09 Jun 2026 11:58:45 -0700

> As for the 0-vs-1
> row question, happy to leave that to the existing behavior (and the
> language lawyers).


There’s no existing behavior, so you need to decide what is the right behavior, 
and test it. See what the other implementations do.

> On Jun 9, 2026, at 11:14 AM, Tisya Bhatia via dev <[email protected]> 
> wrote:
> 
> Hey Julian -
> 
> Glad you're on board. Good point about the "contains an aggregate" vs. "is
> an aggregate." My implementation (PR-pending) already does this: it expands
> GROUP BY ALL to every SELECT expression that does not contain an aggregate,
> using a recursive aggregate finder. So your example expands exactly as you
> wrote.
> 
> I'll also take note of measures - thanks, I'd missed that. And I'll add
> your example as a test case.
> 
> Regarding the degenerate case (grouping only by constants over an empty
> input): GROUP BY ALL there expands to the exact same query as explicitly
> writing GROUP BY 'x', so it introduces no new semantics - it inherits
> whatever Calcite already does for constant-only grouping. As for the 0-vs-1
> row question, happy to leave that to the existing behavior (and the
> language lawyers).
> 
> Thanks,
> Tisya Bhatia
> 
> On Tue, Jun 9, 2026 at 12:57 PM Julian Hyde <[email protected] 
> <mailto:[email protected]>> wrote:
> 
>> I agree; we should support GROUP BY ALL.
>> 
>> “Isn’t an aggregate” should be broadened to something like “doesn’t
>> contain an aggregate and isn’t a measure”.
>> 
>>  SELECT deptno, SUBSTR(job, 1), count(*) + 1 as c, ‘x' as x
>>  FROM emps
>>  GROUP BY ALL
>> 
>> should be equivalent to
>> 
>>  SELECT deptno, SUBSTR(job, 1), count(*) + 1 as c, ‘x' as x
>>  FROM emps
>>  GROUP BY deptno, SUBSTR(job, 1), ‘x'
>> 
>> I’m guessing that’s what the major implementations do. Measures are a
>> Calcite-specific thing.
>> 
>> Note that the constant ‘x’ gets swept into the GROUP BY clause but has no
>> effect because it’s a constant.
>> When there are no rows and we’re only grouping by constants we have a
>> degenerate case and the language lawyers can argue about whether it should
>> return 0 or 1 rows:
>> 
>>  SELECT ‘x’ as x, COUNT(*)
>>  FROM emps
>>  WHERE FALSE
>>  GROUP BY ALL
>> 
>> Julian
>> 
>> 
>> 
>>> On Jun 9, 2026, at 10:25 AM, Tisya Bhatia via dev <
>> [email protected]> wrote:
>>> 
>>> Hi all,
>>> 
>>> I'd like to propose support for GROUP BY ALL in Calcite - a shorthand
>> that
>>> groups every expression in the SELECT clause that isn't an aggregate.
>>> 
>>> Here is the Jira ticket:
>> https://urldefense.com/v3/__https://issues.apache.org/jira/browse/CALCITE-7594__;!!Ayb5sqE7!pEba7l9-97xJOhAdqD6pEBkcgCqK3bHZKZBQDYL8-qHcbXxmWfNc0DmlEL7m8Qm9RF2bQ-92FfXAOcqWtWv7jdJL$
>>> 
>>> Context:
>>> - Today, if I want to group by every non-aggregated expression in the
>>> SELECT list, I would have to repeat them in the group by.
>>> - A GROUP BY ALL shorthand has become convergence across warehouse
>>> dialects, and supporting it keeps Calcite-based dialects updated and
>>> aligned.
>>> 
>>> Proposal:
>>> - Add GROUP BY ALL: when ALL appears with no grouping items, group by
>> every
>>> SELECT expression that is not an aggregate or window function. It's
>>> implemented as a parser marker that the validator rewrites into the
>>> concrete grouping expressions before normal group validation runs, so the
>>> converter / optimizer never see a marker.
>>> - Alternatives considered: I kept ALL (matches DuckDB, Spark, and
>> others).
>>> The existing ALL / DISTINCT set quantifier on grouping items
>> (CALCITE-5089,
>>> e.g. GROUP BY ALL CUBE(a, b)) is fully preserved. The new meaning only
>>> applies when no grouping items follow ALL.
>>> - Expected impact / tradeoff: purely additive. Bare GROUP BY ALL does not
>>> parse today, so enabling it cannot change the meaning of any existing
>>> query. SELECT * with GROUP BY ALL is rejected with a clear error for now
>>> (the star isn't expanded at validation time)
>>> 
>>> For example:
>>> 
>>> SELECT deptno, job, SUM(sal)
>>> FROM emp
>>> GROUP BY ALL;
>>> 
>>> resolves to
>>> 
>>> SELECT deptno, job, SUM(sal)
>>> FROM emp
>>> GROUP BY deptno, job;
>>> 
>>> Open questions:
>>> - Since this feature is additive, I don't think a conformance flag is
>>> needed, but would we prefer one?
>>> - Should SELECT * be supported in this proposal, or is the explicit error
>>> enough for an initial cut?
>>> 
>>> If this direction sounds reasonable, I have a prototype ready (parser +
>>> validator + tests + reference docs) and will put up the PR soon. A
>>> companion ORDER BY ALL proposal (CALCITE-7597) is in the same vein.
>>> 
>>> Best,
>>> Tisya Bhatia

Re: [DISCUSS] Support GROUP BY ALL (CALCITE-7594)

Reply via email to