Re: Why are nested aggregations illegal? Best alternatives?

Gavin Ray Sat, 12 Feb 2022 13:13:49 -0800

Nevermind, this is a standard term not something Calcite-specific it seems!


https://en.wikipedia.org/wiki/Correlated_subquery

On Sat, Feb 12, 2022 at 3:46 PM Gavin Ray <[email protected]> wrote:

> Forgive my ignorance/lack of experience
>
> I am somewhat familiar with the ARRAY() function, but not sure I know the
> term "correlated"
> Searching the Calcite codebase for uses of "correlated" + "query", I found:
>
>
> https://github.com/apache/calcite/blob/1d4f1b394bfdba03c5538017e12ab2431b137ca9/core/src/test/java/org/apache/calcite/test/SqlToRelConverterTest.java#L1603-L1612
>
>   @Test void testCorrelatedSubQueryInJoin() {
>     final String sql = "select *\n"
>         + "from emp as e\n"
>         + "join dept as d using (deptno)\n"
>         + "where d.name = (\n"
>         + "  select max(name)\n"
>         + "  from dept as d2\n"
>         + "  where d2.deptno = d.deptno)";
>     sql(sql).withExpand(false).ok();
>   }
>
> But I also see this, which says it is "uncorrelated" but seems very
> similar?
>
>   @Test void testInUncorrelatedSubQuery() {
>     final String sql = "select empno from emp where deptno in"
>         + " (select deptno from dept)";
>     sql(sql).ok();
>   }
>
> I wouldn't blame you for not answering such a basic question -- but what
> exactly does "correlation" mean here?
>
> Thanks, as usual Julian
>
>
> On Sat, Feb 12, 2022 at 3:08 PM Julian Hyde <[email protected]>
> wrote:
>
>> Correlated ARRAY sub-query?
>>
>> > On Feb 12, 2022, at 10:40 AM, Gavin Ray <[email protected]> wrote:
>> >
>> > Apologies for the delay in replying
>> >
>> > This makes things clear and seems obvious now that you point it out.
>> > Thank you, Justin and Julian =)
>> >
>> > Let me ask another question (if I may) that I am struggling to phrase
>> > easily.
>> >
>> > So with GraphQL, you might have a query like:
>> > - "Get houses"
>> > - "For each house get the user that lives in the house
>> > - "And for each user get their list of todos"
>> >
>> > The result has to come back such that it's a single object for each row
>> > ==============
>> > {
>> >  houses: [{
>> >    address: "123 Main Street",
>> >    users: [{
>> >      name: "Joe",
>> >      todos: [{
>> >        description: "Take out trash"
>> >      }]
>> >    }]
>> > }
>> >
>> > From a SQL perspective, the logical equivalent would be something like:
>> > ==============
>> > SELECT
>> >    house.address,
>> >    (somehow nest users + double-nest todos under user)
>> > FROM
>> >    house
>> > JOIN
>> >    user ON user.house_id = house.id
>> >    todos ON todos.user_id = user.id
>> > WHERE
>> >    house.id = 1
>> >
>> > I'm not familiar enough with SQL to have figured out a way to make this
>> > kind of
>> > query using operators that are supported across most of the DB's
>> Calcite has
>> > adapters for.
>> >
>> > Currently what I have done instead, on a tip from Gopalakrishna Holla
>> from
>> > LinkedIn Coral team who has built GraphQL-on-Calcite, was to break up
>> the
>> > query
>> > into individual parts and then do the join in-memory:
>> >
>> > SELECT ... FROM users;
>> > SELECT ... FROM todos WHERE todos.user_id IN (ResultSet from prev
>> response);
>> >
>> > However, the way I am doing this seems like it's probably very
>> inefficient.
>> > Because I do a series of nested loops to add the key to each object in
>> the
>> > parent ResultSet row:
>> >
>> >
>> https://github.com/GavinRay97/GraphQLCalcite/blob/648e0ac4f6810a3c360d13a03e6597c33406de4b/src/main/kotlin/TableDataFetcher.kt#L135-L153
>> >
>> > Is there some better way of doing this?
>> > I would be eternally grateful for any advice.
>> >
>> >
>> > On Thu, Feb 10, 2022 at 3:27 PM Julian Hyde <[email protected]>
>> wrote:
>> >
>> >> Yes, if you want to do multiple layers of aggregation, use CTEs (WITH)
>> or
>> >> nested sub-queries. For example, the following is I believe valid
>> standard
>> >> SQL, and actually computes something useful:
>> >>
>> >>  WITH q1 AS
>> >>   (SELECT deptno, job, AVG(sal) AS avg_sal
>> >>    FROM emp
>> >>    GROUP BY deptno, job)
>> >>  WITH q2 AS
>> >>   (SELECT deptno, AVG(avg_sal) AS avg_avg_sal
>> >>    FROM q1
>> >>    GROUP BY deptno)
>> >>  SELECT AVG(avg_avg_sal)
>> >>  FROM q2
>> >>  GROUP BY ()
>> >>
>> >> (You can omit the “GROUP BY ()” line, but I think it makes things
>> clearer.)
>> >>
>> >> Julian
>> >>
>> >>
>> >>
>> >>> On Feb 10, 2022, at 12:17 PM, Justin Swanhart <[email protected]>
>> >> wrote:
>> >>>
>> >>> I wish you could unsend emails :)  Answering my own question, no,
>> because
>> >>> that would return three rows with the average :D
>> >>>
>> >>> On Thu, Feb 10, 2022 at 3:16 PM Justin Swanhart <[email protected]>
>> >> wrote:
>> >>>
>> >>>> Just out of curiosity, is the second level aggregation using AVG in a
>> >>>> window context?  It the frame is the whole table and it aggregates
>> over
>> >> it?
>> >>>>
>> >>>> On Thu, Feb 10, 2022 at 3:12 PM Justin Swanhart <[email protected]
>> >
>> >>>> wrote:
>> >>>>
>> >>>>> That is really neat about Oracle.
>> >>>>>
>> >>>>> The alternative in general is to use a subquery:
>> >>>>> SELECT avg(avg(sal)) FROM emp GROUP BY deptno;
>> >>>>> becomes
>> >>>>> select avg(the_avg)
>> >>>>> from (select avg(sal) from emp group b deptno) an_alias;
>> >>>>>
>> >>>>> or
>> >>>>>
>> >>>>> with the_cte as (select avg(sal) x from emp group by deptno)
>> >>>>> select avg(x) from the_cte;
>> >>>>>
>> >>>>> On Thu, Feb 10, 2022 at 3:03 PM Julian Hyde <[email protected]
>> >
>> >>>>> wrote:
>> >>>>>
>> >>>>>> Some databases, e.g. Oracle, allow TWO levels of nesting:
>> >>>>>>
>> >>>>>> SELECT avg(sal) FROM emp GROUP BY deptno;
>> >>>>>>
>> >>>>>> AVG(SAL)
>> >>>>>> ========
>> >>>>>> 1,566.67
>> >>>>>> 2,175.00
>> >>>>>> 2,916.65
>> >>>>>>
>> >>>>>> SELECT avg(avg(sal)) FROM emp GROUP BY deptno;
>> >>>>>>
>> >>>>>> AVG(SUM(SAL))
>> >>>>>> =============
>> >>>>>>       9,675
>> >>>>>>
>> >>>>>> The first level aggregates by department (returning 3 records), and
>> >> the
>> >>>>>> second level computes the grand total (returning 1 record). But
>> that
>> >> is an
>> >>>>>> exceptional case.
>> >>>>>>
>> >>>>>> Generally, any expression in the SELECT or HAVING clause of an
>> >> aggregate
>> >>>>>> query is either ‘before’ or ‘after’ aggregation. Consider
>> >>>>>>
>> >>>>>> SELECT t.x + 1 AS a, 2 + SUM(t.y + 3) AS b
>> >>>>>> FROM t
>> >>>>>> GROUP BY t.x
>> >>>>>>
>> >>>>>> The expressions “t.y” and “t.y + 3” occur before aggregation;
>> “t.x”,
>> >>>>>> “t.x + 1”, “SUM(t.y + 3)” and “2 + SUM(t.y + 3)” occur after
>> >> aggregation.
>> >>>>>> SQL semantics rely heavily on this stratification. Allowing an
>> extra
>> >> level
>> >>>>>> of aggregation would mess it all up.
>> >>>>>>
>> >>>>>> Julian
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>> On Feb 10, 2022, at 9:45 AM, Justin Swanhart <[email protected]
>> >
>> >>>>>> wrote:
>> >>>>>>>
>> >>>>>>> This is a SQL limitation.
>> >>>>>>>
>> >>>>>>> mysql> select sum(1);
>> >>>>>>> +--------+
>> >>>>>>> | sum(1) |
>> >>>>>>> +--------+
>> >>>>>>> |      1 |
>> >>>>>>> +--------+
>> >>>>>>> 1 row in set (0.00 sec)
>> >>>>>>>
>> >>>>>>> mysql> select sum(sum(1));
>> >>>>>>> ERROR 1111 (HY000): Invalid use of group function
>> >>>>>>>
>> >>>>>>> On Thu, Feb 10, 2022 at 12:39 PM Gavin Ray <[email protected]
>> >
>> >>>>>> wrote:
>> >>>>>>>
>> >>>>>>>> Went to test this query out and found that it can't be performed:
>> >>>>>>>>
>> >>>>>>>> SELECT
>> >>>>>>>>  JSON_OBJECT(
>> >>>>>>>>      KEY 'users'
>> >>>>>>>>      VALUE JSON_ARRAYAGG(
>> >>>>>>>>          JSON_OBJECT(
>> >>>>>>>>              KEY 'name' VALUE "users"."name",
>> >>>>>>>>              KEY 'todos' VALUE JSON_ARRAYAGG(
>> >>>>>>>>                  JSON_OBJECT(
>> >>>>>>>>                      KEY 'description' VALUE
>> "todos"."description"
>> >>>>>>>>                  )
>> >>>>>>>>              )
>> >>>>>>>>          )
>> >>>>>>>>      )
>> >>>>>>>>  )
>> >>>>>>>> FROM
>> >>>>>>>>  "users"
>> >>>>>>>> LEFT OUTER JOIN
>> >>>>>>>>  "todos" ON "users"."id" = "todos"."user_id";
>> >>>>>>>>
>> >>>>>>>> Checking the source, seems this is a blanket policy, not a
>> >>>>>>>> datasource-specific thing.
>> >>>>>>>> From a functional perspective, it doesn't feel like it's much
>> >>>>>> different
>> >>>>>>>> from JOINs
>> >>>>>>>> But I don't understand relational theory or DB functionality in
>> the
>> >>>>>> least,
>> >>>>>>>> so I'm not fit to judge.
>> >>>>>>>>
>> >>>>>>>> Just curious why Calcite doesn't allow this
>> >>>>>>>>
>> >>>>>>
>> >>>>>>
>> >>
>> >>
>>
>>

Re: Why are nested aggregations illegal? Best alternatives?

Reply via email to