Re: Why are nested aggregations illegal? Best alternatives?

Gavin Ray Sat, 12 Feb 2022 10:41:43 -0800

Apologies for the delay in replying

This makes things clear and seems obvious now that you point it out.
Thank you, Justin and Julian =)


Let me ask another question (if I may) that I am struggling to phrase
easily.

So with GraphQL, you might have a query like:
- "Get houses"
- "For each house get the user that lives in the house
- "And for each user get their list of todos"

The result has to come back such that it's a single object for each row
==============
{
  houses: [{
    address: "123 Main Street",
    users: [{
      name: "Joe",
      todos: [{
        description: "Take out trash"
      }]
    }]
}

>From a SQL perspective, the logical equivalent would be something like:
==============
SELECT
    house.address,
    (somehow nest users + double-nest todos under user)
FROM
    house
JOIN
    user ON user.house_id = house.id
    todos ON todos.user_id = user.id
WHERE
    house.id = 1

I'm not familiar enough with SQL to have figured out a way to make this
kind of
query using operators that are supported across most of the DB's Calcite has
adapters for.

Currently what I have done instead, on a tip from Gopalakrishna Holla from
LinkedIn Coral team who has built GraphQL-on-Calcite, was to break up the
query
into individual parts and then do the join in-memory:

SELECT ... FROM users;
SELECT ... FROM todos WHERE todos.user_id IN (ResultSet from prev response);

However, the way I am doing this seems like it's probably very inefficient.
Because I do a series of nested loops to add the key to each object in the
parent ResultSet row:

https://github.com/GavinRay97/GraphQLCalcite/blob/648e0ac4f6810a3c360d13a03e6597c33406de4b/src/main/kotlin/TableDataFetcher.kt#L135-L153

Is there some better way of doing this?
I would be eternally grateful for any advice.


On Thu, Feb 10, 2022 at 3:27 PM Julian Hyde <[email protected]> wrote:

> Yes, if you want to do multiple layers of aggregation, use CTEs (WITH) or
> nested sub-queries. For example, the following is I believe valid standard
> SQL, and actually computes something useful:
>
>   WITH q1 AS
>    (SELECT deptno, job, AVG(sal) AS avg_sal
>     FROM emp
>     GROUP BY deptno, job)
>   WITH q2 AS
>    (SELECT deptno, AVG(avg_sal) AS avg_avg_sal
>     FROM q1
>     GROUP BY deptno)
>   SELECT AVG(avg_avg_sal)
>   FROM q2
>   GROUP BY ()
>
> (You can omit the “GROUP BY ()” line, but I think it makes things clearer.)
>
> Julian
>
>
>
> > On Feb 10, 2022, at 12:17 PM, Justin Swanhart <[email protected]>
> wrote:
> >
> > I wish you could unsend emails :)  Answering my own question, no, because
> > that would return three rows with the average :D
> >
> > On Thu, Feb 10, 2022 at 3:16 PM Justin Swanhart <[email protected]>
> wrote:
> >
> >> Just out of curiosity, is the second level aggregation using AVG in a
> >> window context?  It the frame is the whole table and it aggregates over
> it?
> >>
> >> On Thu, Feb 10, 2022 at 3:12 PM Justin Swanhart <[email protected]>
> >> wrote:
> >>
> >>> That is really neat about Oracle.
> >>>
> >>> The alternative in general is to use a subquery:
> >>> SELECT avg(avg(sal)) FROM emp GROUP BY deptno;
> >>> becomes
> >>> select avg(the_avg)
> >>> from (select avg(sal) from emp group b deptno) an_alias;
> >>>
> >>> or
> >>>
> >>> with the_cte as (select avg(sal) x from emp group by deptno)
> >>> select avg(x) from the_cte;
> >>>
> >>> On Thu, Feb 10, 2022 at 3:03 PM Julian Hyde <[email protected]>
> >>> wrote:
> >>>
> >>>> Some databases, e.g. Oracle, allow TWO levels of nesting:
> >>>>
> >>>> SELECT avg(sal) FROM emp GROUP BY deptno;
> >>>>
> >>>> AVG(SAL)
> >>>> ========
> >>>> 1,566.67
> >>>> 2,175.00
> >>>> 2,916.65
> >>>>
> >>>> SELECT avg(avg(sal)) FROM emp GROUP BY deptno;
> >>>>
> >>>> AVG(SUM(SAL))
> >>>> =============
> >>>>        9,675
> >>>>
> >>>> The first level aggregates by department (returning 3 records), and
> the
> >>>> second level computes the grand total (returning 1 record). But that
> is an
> >>>> exceptional case.
> >>>>
> >>>> Generally, any expression in the SELECT or HAVING clause of an
> aggregate
> >>>> query is either ‘before’ or ‘after’ aggregation. Consider
> >>>>
> >>>>  SELECT t.x + 1 AS a, 2 + SUM(t.y + 3) AS b
> >>>>  FROM t
> >>>>  GROUP BY t.x
> >>>>
> >>>> The expressions “t.y” and “t.y + 3” occur before aggregation; “t.x”,
> >>>> “t.x + 1”, “SUM(t.y + 3)” and “2 + SUM(t.y + 3)” occur after
> aggregation.
> >>>> SQL semantics rely heavily on this stratification. Allowing an extra
> level
> >>>> of aggregation would mess it all up.
> >>>>
> >>>> Julian
> >>>>
> >>>>
> >>>>
> >>>>> On Feb 10, 2022, at 9:45 AM, Justin Swanhart <[email protected]>
> >>>> wrote:
> >>>>>
> >>>>> This is a SQL limitation.
> >>>>>
> >>>>> mysql> select sum(1);
> >>>>> +--------+
> >>>>> | sum(1) |
> >>>>> +--------+
> >>>>> |      1 |
> >>>>> +--------+
> >>>>> 1 row in set (0.00 sec)
> >>>>>
> >>>>> mysql> select sum(sum(1));
> >>>>> ERROR 1111 (HY000): Invalid use of group function
> >>>>>
> >>>>> On Thu, Feb 10, 2022 at 12:39 PM Gavin Ray <[email protected]>
> >>>> wrote:
> >>>>>
> >>>>>> Went to test this query out and found that it can't be performed:
> >>>>>>
> >>>>>> SELECT
> >>>>>>   JSON_OBJECT(
> >>>>>>       KEY 'users'
> >>>>>>       VALUE JSON_ARRAYAGG(
> >>>>>>           JSON_OBJECT(
> >>>>>>               KEY 'name' VALUE "users"."name",
> >>>>>>               KEY 'todos' VALUE JSON_ARRAYAGG(
> >>>>>>                   JSON_OBJECT(
> >>>>>>                       KEY 'description' VALUE "todos"."description"
> >>>>>>                   )
> >>>>>>               )
> >>>>>>           )
> >>>>>>       )
> >>>>>>   )
> >>>>>> FROM
> >>>>>>   "users"
> >>>>>> LEFT OUTER JOIN
> >>>>>>   "todos" ON "users"."id" = "todos"."user_id";
> >>>>>>
> >>>>>> Checking the source, seems this is a blanket policy, not a
> >>>>>> datasource-specific thing.
> >>>>>> From a functional perspective, it doesn't feel like it's much
> >>>> different
> >>>>>> from JOINs
> >>>>>> But I don't understand relational theory or DB functionality in the
> >>>> least,
> >>>>>> so I'm not fit to judge.
> >>>>>>
> >>>>>> Just curious why Calcite doesn't allow this
> >>>>>>
> >>>>
> >>>>
>
>

Re: Why are nested aggregations illegal? Best alternatives?

Reply via email to