Re: Why are nested aggregations illegal? Best alternatives?

Julian Hyde Sat, 12 Feb 2022 12:08:10 -0800
Correlated ARRAY sub-query?

> On Feb 12, 2022, at 10:40 AM, Gavin Ray <[email protected]> wrote:
> 
> Apologies for the delay in replying
> 
> This makes things clear and seems obvious now that you point it out.
> Thank you, Justin and Julian =)
> 
> Let me ask another question (if I may) that I am struggling to phrase
> easily.
> 
> So with GraphQL, you might have a query like:
> - "Get houses"
> - "For each house get the user that lives in the house
> - "And for each user get their list of todos"
> 
> The result has to come back such that it's a single object for each row
> ==============
> {
>  houses: [{
>    address: "123 Main Street",
>    users: [{
>      name: "Joe",
>      todos: [{
>        description: "Take out trash"
>      }]
>    }]
> }
> 
> From a SQL perspective, the logical equivalent would be something like:
> ==============
> SELECT
>    house.address,
>    (somehow nest users + double-nest todos under user)
> FROM
>    house
> JOIN
>    user ON user.house_id = house.id
>    todos ON todos.user_id = user.id
> WHERE
>    house.id = 1
> 
> I'm not familiar enough with SQL to have figured out a way to make this
> kind of
> query using operators that are supported across most of the DB's Calcite has
> adapters for.
> 
> Currently what I have done instead, on a tip from Gopalakrishna Holla from
> LinkedIn Coral team who has built GraphQL-on-Calcite, was to break up the
> query
> into individual parts and then do the join in-memory:
> 
> SELECT ... FROM users;
> SELECT ... FROM todos WHERE todos.user_id IN (ResultSet from prev response);
> 
> However, the way I am doing this seems like it's probably very inefficient.
> Because I do a series of nested loops to add the key to each object in the
> parent ResultSet row:
> 
> https://github.com/GavinRay97/GraphQLCalcite/blob/648e0ac4f6810a3c360d13a03e6597c33406de4b/src/main/kotlin/TableDataFetcher.kt#L135-L153
> 
> Is there some better way of doing this?
> I would be eternally grateful for any advice.
> 
> 
> On Thu, Feb 10, 2022 at 3:27 PM Julian Hyde <[email protected]> wrote:
> 
>> Yes, if you want to do multiple layers of aggregation, use CTEs (WITH) or
>> nested sub-queries. For example, the following is I believe valid standard
>> SQL, and actually computes something useful:
>> 
>>  WITH q1 AS
>>   (SELECT deptno, job, AVG(sal) AS avg_sal
>>    FROM emp
>>    GROUP BY deptno, job)
>>  WITH q2 AS
>>   (SELECT deptno, AVG(avg_sal) AS avg_avg_sal
>>    FROM q1
>>    GROUP BY deptno)
>>  SELECT AVG(avg_avg_sal)
>>  FROM q2
>>  GROUP BY ()
>> 
>> (You can omit the “GROUP BY ()” line, but I think it makes things clearer.)
>> 
>> Julian
>> 
>> 
>> 
>>> On Feb 10, 2022, at 12:17 PM, Justin Swanhart <[email protected]>
>> wrote:
>>> 
>>> I wish you could unsend emails :)  Answering my own question, no, because
>>> that would return three rows with the average :D
>>> 
>>> On Thu, Feb 10, 2022 at 3:16 PM Justin Swanhart <[email protected]>
>> wrote:
>>> 
>>>> Just out of curiosity, is the second level aggregation using AVG in a
>>>> window context?  It the frame is the whole table and it aggregates over
>> it?
>>>> 
>>>> On Thu, Feb 10, 2022 at 3:12 PM Justin Swanhart <[email protected]>
>>>> wrote:
>>>> 
>>>>> That is really neat about Oracle.
>>>>> 
>>>>> The alternative in general is to use a subquery:
>>>>> SELECT avg(avg(sal)) FROM emp GROUP BY deptno;
>>>>> becomes
>>>>> select avg(the_avg)
>>>>> from (select avg(sal) from emp group b deptno) an_alias;
>>>>> 
>>>>> or
>>>>> 
>>>>> with the_cte as (select avg(sal) x from emp group by deptno)
>>>>> select avg(x) from the_cte;
>>>>> 
>>>>> On Thu, Feb 10, 2022 at 3:03 PM Julian Hyde <[email protected]>
>>>>> wrote:
>>>>> 
>>>>>> Some databases, e.g. Oracle, allow TWO levels of nesting:
>>>>>> 
>>>>>> SELECT avg(sal) FROM emp GROUP BY deptno;
>>>>>> 
>>>>>> AVG(SAL)
>>>>>> ========
>>>>>> 1,566.67
>>>>>> 2,175.00
>>>>>> 2,916.65
>>>>>> 
>>>>>> SELECT avg(avg(sal)) FROM emp GROUP BY deptno;
>>>>>> 
>>>>>> AVG(SUM(SAL))
>>>>>> =============
>>>>>>       9,675
>>>>>> 
>>>>>> The first level aggregates by department (returning 3 records), and
>> the
>>>>>> second level computes the grand total (returning 1 record). But that
>> is an
>>>>>> exceptional case.
>>>>>> 
>>>>>> Generally, any expression in the SELECT or HAVING clause of an
>> aggregate
>>>>>> query is either ‘before’ or ‘after’ aggregation. Consider
>>>>>> 
>>>>>> SELECT t.x + 1 AS a, 2 + SUM(t.y + 3) AS b
>>>>>> FROM t
>>>>>> GROUP BY t.x
>>>>>> 
>>>>>> The expressions “t.y” and “t.y + 3” occur before aggregation; “t.x”,
>>>>>> “t.x + 1”, “SUM(t.y + 3)” and “2 + SUM(t.y + 3)” occur after
>> aggregation.
>>>>>> SQL semantics rely heavily on this stratification. Allowing an extra
>> level
>>>>>> of aggregation would mess it all up.
>>>>>> 
>>>>>> Julian
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On Feb 10, 2022, at 9:45 AM, Justin Swanhart <[email protected]>
>>>>>> wrote:
>>>>>>> 
>>>>>>> This is a SQL limitation.
>>>>>>> 
>>>>>>> mysql> select sum(1);
>>>>>>> +--------+
>>>>>>> | sum(1) |
>>>>>>> +--------+
>>>>>>> |      1 |
>>>>>>> +--------+
>>>>>>> 1 row in set (0.00 sec)
>>>>>>> 
>>>>>>> mysql> select sum(sum(1));
>>>>>>> ERROR 1111 (HY000): Invalid use of group function
>>>>>>> 
>>>>>>> On Thu, Feb 10, 2022 at 12:39 PM Gavin Ray <[email protected]>
>>>>>> wrote:
>>>>>>> 
>>>>>>>> Went to test this query out and found that it can't be performed:
>>>>>>>> 
>>>>>>>> SELECT
>>>>>>>>  JSON_OBJECT(
>>>>>>>>      KEY 'users'
>>>>>>>>      VALUE JSON_ARRAYAGG(
>>>>>>>>          JSON_OBJECT(
>>>>>>>>              KEY 'name' VALUE "users"."name",
>>>>>>>>              KEY 'todos' VALUE JSON_ARRAYAGG(
>>>>>>>>                  JSON_OBJECT(
>>>>>>>>                      KEY 'description' VALUE "todos"."description"
>>>>>>>>                  )
>>>>>>>>              )
>>>>>>>>          )
>>>>>>>>      )
>>>>>>>>  )
>>>>>>>> FROM
>>>>>>>>  "users"
>>>>>>>> LEFT OUTER JOIN
>>>>>>>>  "todos" ON "users"."id" = "todos"."user_id";
>>>>>>>> 
>>>>>>>> Checking the source, seems this is a blanket policy, not a
>>>>>>>> datasource-specific thing.
>>>>>>>> From a functional perspective, it doesn't feel like it's much
>>>>>> different
>>>>>>>> from JOINs
>>>>>>>> But I don't understand relational theory or DB functionality in the
>>>>>> least,
>>>>>>>> so I'm not fit to judge.
>>>>>>>> 
>>>>>>>> Just curious why Calcite doesn't allow this
>>>>>>>> 
>>>>>> 
>>>>>> 
>> 
>>
Re: Why are nested aggregations illegal? Best alternatives?

Reply via email to