Apologies for the delay in replying
This makes things clear and seems obvious now that you point it out.
Thank you, Justin and Julian =)
Let me ask another question (if I may) that I am struggling to phrase
easily.
So with GraphQL, you might have a query like:
- "Get houses"
- "For each house get the user that lives in the house
- "And for each user get their list of todos"
The result has to come back such that it's a single object for each row
==============
{
houses: [{
address: "123 Main Street",
users: [{
name: "Joe",
todos: [{
description: "Take out trash"
}]
}]
}
>From a SQL perspective, the logical equivalent would be something like:
==============
SELECT
house.address,
(somehow nest users + double-nest todos under user)
FROM
house
JOIN
user ON user.house_id = house.id
todos ON todos.user_id = user.id
WHERE
house.id = 1
I'm not familiar enough with SQL to have figured out a way to make this
kind of
query using operators that are supported across most of the DB's Calcite has
adapters for.
Currently what I have done instead, on a tip from Gopalakrishna Holla from
LinkedIn Coral team who has built GraphQL-on-Calcite, was to break up the
query
into individual parts and then do the join in-memory:
SELECT ... FROM users;
SELECT ... FROM todos WHERE todos.user_id IN (ResultSet from prev response);
However, the way I am doing this seems like it's probably very inefficient.
Because I do a series of nested loops to add the key to each object in the
parent ResultSet row:
https://github.com/GavinRay97/GraphQLCalcite/blob/648e0ac4f6810a3c360d13a03e6597c33406de4b/src/main/kotlin/TableDataFetcher.kt#L135-L153
Is there some better way of doing this?
I would be eternally grateful for any advice.
On Thu, Feb 10, 2022 at 3:27 PM Julian Hyde <[email protected]> wrote:
> Yes, if you want to do multiple layers of aggregation, use CTEs (WITH) or
> nested sub-queries. For example, the following is I believe valid standard
> SQL, and actually computes something useful:
>
> WITH q1 AS
> (SELECT deptno, job, AVG(sal) AS avg_sal
> FROM emp
> GROUP BY deptno, job)
> WITH q2 AS
> (SELECT deptno, AVG(avg_sal) AS avg_avg_sal
> FROM q1
> GROUP BY deptno)
> SELECT AVG(avg_avg_sal)
> FROM q2
> GROUP BY ()
>
> (You can omit the “GROUP BY ()” line, but I think it makes things clearer.)
>
> Julian
>
>
>
> > On Feb 10, 2022, at 12:17 PM, Justin Swanhart <[email protected]>
> wrote:
> >
> > I wish you could unsend emails :) Answering my own question, no, because
> > that would return three rows with the average :D
> >
> > On Thu, Feb 10, 2022 at 3:16 PM Justin Swanhart <[email protected]>
> wrote:
> >
> >> Just out of curiosity, is the second level aggregation using AVG in a
> >> window context? It the frame is the whole table and it aggregates over
> it?
> >>
> >> On Thu, Feb 10, 2022 at 3:12 PM Justin Swanhart <[email protected]>
> >> wrote:
> >>
> >>> That is really neat about Oracle.
> >>>
> >>> The alternative in general is to use a subquery:
> >>> SELECT avg(avg(sal)) FROM emp GROUP BY deptno;
> >>> becomes
> >>> select avg(the_avg)
> >>> from (select avg(sal) from emp group b deptno) an_alias;
> >>>
> >>> or
> >>>
> >>> with the_cte as (select avg(sal) x from emp group by deptno)
> >>> select avg(x) from the_cte;
> >>>
> >>> On Thu, Feb 10, 2022 at 3:03 PM Julian Hyde <[email protected]>
> >>> wrote:
> >>>
> >>>> Some databases, e.g. Oracle, allow TWO levels of nesting:
> >>>>
> >>>> SELECT avg(sal) FROM emp GROUP BY deptno;
> >>>>
> >>>> AVG(SAL)
> >>>> ========
> >>>> 1,566.67
> >>>> 2,175.00
> >>>> 2,916.65
> >>>>
> >>>> SELECT avg(avg(sal)) FROM emp GROUP BY deptno;
> >>>>
> >>>> AVG(SUM(SAL))
> >>>> =============
> >>>> 9,675
> >>>>
> >>>> The first level aggregates by department (returning 3 records), and
> the
> >>>> second level computes the grand total (returning 1 record). But that
> is an
> >>>> exceptional case.
> >>>>
> >>>> Generally, any expression in the SELECT or HAVING clause of an
> aggregate
> >>>> query is either ‘before’ or ‘after’ aggregation. Consider
> >>>>
> >>>> SELECT t.x + 1 AS a, 2 + SUM(t.y + 3) AS b
> >>>> FROM t
> >>>> GROUP BY t.x
> >>>>
> >>>> The expressions “t.y” and “t.y + 3” occur before aggregation; “t.x”,
> >>>> “t.x + 1”, “SUM(t.y + 3)” and “2 + SUM(t.y + 3)” occur after
> aggregation.
> >>>> SQL semantics rely heavily on this stratification. Allowing an extra
> level
> >>>> of aggregation would mess it all up.
> >>>>
> >>>> Julian
> >>>>
> >>>>
> >>>>
> >>>>> On Feb 10, 2022, at 9:45 AM, Justin Swanhart <[email protected]>
> >>>> wrote:
> >>>>>
> >>>>> This is a SQL limitation.
> >>>>>
> >>>>> mysql> select sum(1);
> >>>>> +--------+
> >>>>> | sum(1) |
> >>>>> +--------+
> >>>>> | 1 |
> >>>>> +--------+
> >>>>> 1 row in set (0.00 sec)
> >>>>>
> >>>>> mysql> select sum(sum(1));
> >>>>> ERROR 1111 (HY000): Invalid use of group function
> >>>>>
> >>>>> On Thu, Feb 10, 2022 at 12:39 PM Gavin Ray <[email protected]>
> >>>> wrote:
> >>>>>
> >>>>>> Went to test this query out and found that it can't be performed:
> >>>>>>
> >>>>>> SELECT
> >>>>>> JSON_OBJECT(
> >>>>>> KEY 'users'
> >>>>>> VALUE JSON_ARRAYAGG(
> >>>>>> JSON_OBJECT(
> >>>>>> KEY 'name' VALUE "users"."name",
> >>>>>> KEY 'todos' VALUE JSON_ARRAYAGG(
> >>>>>> JSON_OBJECT(
> >>>>>> KEY 'description' VALUE "todos"."description"
> >>>>>> )
> >>>>>> )
> >>>>>> )
> >>>>>> )
> >>>>>> )
> >>>>>> FROM
> >>>>>> "users"
> >>>>>> LEFT OUTER JOIN
> >>>>>> "todos" ON "users"."id" = "todos"."user_id";
> >>>>>>
> >>>>>> Checking the source, seems this is a blanket policy, not a
> >>>>>> datasource-specific thing.
> >>>>>> From a functional perspective, it doesn't feel like it's much
> >>>> different
> >>>>>> from JOINs
> >>>>>> But I don't understand relational theory or DB functionality in the
> >>>> least,
> >>>>>> so I'm not fit to judge.
> >>>>>>
> >>>>>> Just curious why Calcite doesn't allow this
> >>>>>>
> >>>>
> >>>>
>
>