After ~5 hours, I think I may have made some progress =)
I have this, which currently works. The problem is that the nested columns
don't have names on them.
Since I need to return a nested "Map<String, Object>", I have to figure out
how to convert this query into a form that gives column names.
But this is still great progress I think!
SELECT
"todos".*,
ARRAY(
SELECT
"users".*,
ARRAY(
SELECT
"todos".*
FROM
"todos"
) AS "todos"
FROM
"users"
) AS "users"
FROM
"todos"
WHERE
"user_id" IN (
SELECT
"user_id"
FROM
"users"
WHERE
"house_id" IN (
SELECT
"id"
FROM
"houses"
)
);
+----+---------+------------------------+------------------------------------------------------------------------------+
| id | user_id | description |
|
+----+---------+------------------------+------------------------------------------------------------------------------+
| 1 | 1 | Take out the trash | [{1, John, 1, [{1, 1, Take out
the trash}, {2, 1, Watch my favorite show}, { |
| 2 | 1 | Watch my favorite show | [{1, John, 1, [{1, 1, Take out
the trash}, {2, 1, Watch my favorite show}, { |
| 3 | 1 | Charge my phone | [{1, John, 1, [{1, 1, Take out
the trash}, {2, 1, Watch my favorite show}, { |
| 4 | 2 | Cook dinner | [{1, John, 1, [{1, 1, Take out
the trash}, {2, 1, Watch my favorite show}, { |
| 5 | 2 | Read a book | [{1, John, 1, [{1, 1, Take out
the trash}, {2, 1, Watch my favorite show}, { |
| 6 | 2 | Organize office | [{1, John, 1, [{1, 1, Take out
the trash}, {2, 1, Watch my favorite show}, { |
| 7 | 3 | Walk the dog | [{1, John, 1, [{1, 1, Take out
the trash}, {2, 1, Watch my favorite show}, { |
| 8 | 3 | Feed the cat | [{1, John, 1, [{1, 1, Take out
the trash}, {2, 1, Watch my favorite show}, { |
+----+---------+------------------------+------------------------------------------------------------------------------+
On Sat, Feb 12, 2022 at 4:13 PM Gavin Ray <[email protected]> wrote:
> Nevermind, this is a standard term not something Calcite-specific it seems!
>
> https://en.wikipedia.org/wiki/Correlated_subquery
>
> On Sat, Feb 12, 2022 at 3:46 PM Gavin Ray <[email protected]> wrote:
>
>> Forgive my ignorance/lack of experience
>>
>> I am somewhat familiar with the ARRAY() function, but not sure I know the
>> term "correlated"
>> Searching the Calcite codebase for uses of "correlated" + "query", I
>> found:
>>
>>
>> https://github.com/apache/calcite/blob/1d4f1b394bfdba03c5538017e12ab2431b137ca9/core/src/test/java/org/apache/calcite/test/SqlToRelConverterTest.java#L1603-L1612
>>
>> @Test void testCorrelatedSubQueryInJoin() {
>> final String sql = "select *\n"
>> + "from emp as e\n"
>> + "join dept as d using (deptno)\n"
>> + "where d.name = (\n"
>> + " select max(name)\n"
>> + " from dept as d2\n"
>> + " where d2.deptno = d.deptno)";
>> sql(sql).withExpand(false).ok();
>> }
>>
>> But I also see this, which says it is "uncorrelated" but seems very
>> similar?
>>
>> @Test void testInUncorrelatedSubQuery() {
>> final String sql = "select empno from emp where deptno in"
>> + " (select deptno from dept)";
>> sql(sql).ok();
>> }
>>
>> I wouldn't blame you for not answering such a basic question -- but what
>> exactly does "correlation" mean here?
>>
>> Thanks, as usual Julian
>>
>>
>> On Sat, Feb 12, 2022 at 3:08 PM Julian Hyde <[email protected]>
>> wrote:
>>
>>> Correlated ARRAY sub-query?
>>>
>>> > On Feb 12, 2022, at 10:40 AM, Gavin Ray <[email protected]> wrote:
>>> >
>>> > Apologies for the delay in replying
>>> >
>>> > This makes things clear and seems obvious now that you point it out.
>>> > Thank you, Justin and Julian =)
>>> >
>>> > Let me ask another question (if I may) that I am struggling to phrase
>>> > easily.
>>> >
>>> > So with GraphQL, you might have a query like:
>>> > - "Get houses"
>>> > - "For each house get the user that lives in the house
>>> > - "And for each user get their list of todos"
>>> >
>>> > The result has to come back such that it's a single object for each row
>>> > ==============
>>> > {
>>> > houses: [{
>>> > address: "123 Main Street",
>>> > users: [{
>>> > name: "Joe",
>>> > todos: [{
>>> > description: "Take out trash"
>>> > }]
>>> > }]
>>> > }
>>> >
>>> > From a SQL perspective, the logical equivalent would be something like:
>>> > ==============
>>> > SELECT
>>> > house.address,
>>> > (somehow nest users + double-nest todos under user)
>>> > FROM
>>> > house
>>> > JOIN
>>> > user ON user.house_id = house.id
>>> > todos ON todos.user_id = user.id
>>> > WHERE
>>> > house.id = 1
>>> >
>>> > I'm not familiar enough with SQL to have figured out a way to make this
>>> > kind of
>>> > query using operators that are supported across most of the DB's
>>> Calcite has
>>> > adapters for.
>>> >
>>> > Currently what I have done instead, on a tip from Gopalakrishna Holla
>>> from
>>> > LinkedIn Coral team who has built GraphQL-on-Calcite, was to break up
>>> the
>>> > query
>>> > into individual parts and then do the join in-memory:
>>> >
>>> > SELECT ... FROM users;
>>> > SELECT ... FROM todos WHERE todos.user_id IN (ResultSet from prev
>>> response);
>>> >
>>> > However, the way I am doing this seems like it's probably very
>>> inefficient.
>>> > Because I do a series of nested loops to add the key to each object in
>>> the
>>> > parent ResultSet row:
>>> >
>>> >
>>> https://github.com/GavinRay97/GraphQLCalcite/blob/648e0ac4f6810a3c360d13a03e6597c33406de4b/src/main/kotlin/TableDataFetcher.kt#L135-L153
>>> >
>>> > Is there some better way of doing this?
>>> > I would be eternally grateful for any advice.
>>> >
>>> >
>>> > On Thu, Feb 10, 2022 at 3:27 PM Julian Hyde <[email protected]>
>>> wrote:
>>> >
>>> >> Yes, if you want to do multiple layers of aggregation, use CTEs
>>> (WITH) or
>>> >> nested sub-queries. For example, the following is I believe valid
>>> standard
>>> >> SQL, and actually computes something useful:
>>> >>
>>> >> WITH q1 AS
>>> >> (SELECT deptno, job, AVG(sal) AS avg_sal
>>> >> FROM emp
>>> >> GROUP BY deptno, job)
>>> >> WITH q2 AS
>>> >> (SELECT deptno, AVG(avg_sal) AS avg_avg_sal
>>> >> FROM q1
>>> >> GROUP BY deptno)
>>> >> SELECT AVG(avg_avg_sal)
>>> >> FROM q2
>>> >> GROUP BY ()
>>> >>
>>> >> (You can omit the “GROUP BY ()” line, but I think it makes things
>>> clearer.)
>>> >>
>>> >> Julian
>>> >>
>>> >>
>>> >>
>>> >>> On Feb 10, 2022, at 12:17 PM, Justin Swanhart <[email protected]>
>>> >> wrote:
>>> >>>
>>> >>> I wish you could unsend emails :) Answering my own question, no,
>>> because
>>> >>> that would return three rows with the average :D
>>> >>>
>>> >>> On Thu, Feb 10, 2022 at 3:16 PM Justin Swanhart <[email protected]
>>> >
>>> >> wrote:
>>> >>>
>>> >>>> Just out of curiosity, is the second level aggregation using AVG in
>>> a
>>> >>>> window context? It the frame is the whole table and it aggregates
>>> over
>>> >> it?
>>> >>>>
>>> >>>> On Thu, Feb 10, 2022 at 3:12 PM Justin Swanhart <
>>> [email protected]>
>>> >>>> wrote:
>>> >>>>
>>> >>>>> That is really neat about Oracle.
>>> >>>>>
>>> >>>>> The alternative in general is to use a subquery:
>>> >>>>> SELECT avg(avg(sal)) FROM emp GROUP BY deptno;
>>> >>>>> becomes
>>> >>>>> select avg(the_avg)
>>> >>>>> from (select avg(sal) from emp group b deptno) an_alias;
>>> >>>>>
>>> >>>>> or
>>> >>>>>
>>> >>>>> with the_cte as (select avg(sal) x from emp group by deptno)
>>> >>>>> select avg(x) from the_cte;
>>> >>>>>
>>> >>>>> On Thu, Feb 10, 2022 at 3:03 PM Julian Hyde <
>>> [email protected]>
>>> >>>>> wrote:
>>> >>>>>
>>> >>>>>> Some databases, e.g. Oracle, allow TWO levels of nesting:
>>> >>>>>>
>>> >>>>>> SELECT avg(sal) FROM emp GROUP BY deptno;
>>> >>>>>>
>>> >>>>>> AVG(SAL)
>>> >>>>>> ========
>>> >>>>>> 1,566.67
>>> >>>>>> 2,175.00
>>> >>>>>> 2,916.65
>>> >>>>>>
>>> >>>>>> SELECT avg(avg(sal)) FROM emp GROUP BY deptno;
>>> >>>>>>
>>> >>>>>> AVG(SUM(SAL))
>>> >>>>>> =============
>>> >>>>>> 9,675
>>> >>>>>>
>>> >>>>>> The first level aggregates by department (returning 3 records),
>>> and
>>> >> the
>>> >>>>>> second level computes the grand total (returning 1 record). But
>>> that
>>> >> is an
>>> >>>>>> exceptional case.
>>> >>>>>>
>>> >>>>>> Generally, any expression in the SELECT or HAVING clause of an
>>> >> aggregate
>>> >>>>>> query is either ‘before’ or ‘after’ aggregation. Consider
>>> >>>>>>
>>> >>>>>> SELECT t.x + 1 AS a, 2 + SUM(t.y + 3) AS b
>>> >>>>>> FROM t
>>> >>>>>> GROUP BY t.x
>>> >>>>>>
>>> >>>>>> The expressions “t.y” and “t.y + 3” occur before aggregation;
>>> “t.x”,
>>> >>>>>> “t.x + 1”, “SUM(t.y + 3)” and “2 + SUM(t.y + 3)” occur after
>>> >> aggregation.
>>> >>>>>> SQL semantics rely heavily on this stratification. Allowing an
>>> extra
>>> >> level
>>> >>>>>> of aggregation would mess it all up.
>>> >>>>>>
>>> >>>>>> Julian
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>> On Feb 10, 2022, at 9:45 AM, Justin Swanhart <
>>> [email protected]>
>>> >>>>>> wrote:
>>> >>>>>>>
>>> >>>>>>> This is a SQL limitation.
>>> >>>>>>>
>>> >>>>>>> mysql> select sum(1);
>>> >>>>>>> +--------+
>>> >>>>>>> | sum(1) |
>>> >>>>>>> +--------+
>>> >>>>>>> | 1 |
>>> >>>>>>> +--------+
>>> >>>>>>> 1 row in set (0.00 sec)
>>> >>>>>>>
>>> >>>>>>> mysql> select sum(sum(1));
>>> >>>>>>> ERROR 1111 (HY000): Invalid use of group function
>>> >>>>>>>
>>> >>>>>>> On Thu, Feb 10, 2022 at 12:39 PM Gavin Ray <
>>> [email protected]>
>>> >>>>>> wrote:
>>> >>>>>>>
>>> >>>>>>>> Went to test this query out and found that it can't be
>>> performed:
>>> >>>>>>>>
>>> >>>>>>>> SELECT
>>> >>>>>>>> JSON_OBJECT(
>>> >>>>>>>> KEY 'users'
>>> >>>>>>>> VALUE JSON_ARRAYAGG(
>>> >>>>>>>> JSON_OBJECT(
>>> >>>>>>>> KEY 'name' VALUE "users"."name",
>>> >>>>>>>> KEY 'todos' VALUE JSON_ARRAYAGG(
>>> >>>>>>>> JSON_OBJECT(
>>> >>>>>>>> KEY 'description' VALUE
>>> "todos"."description"
>>> >>>>>>>> )
>>> >>>>>>>> )
>>> >>>>>>>> )
>>> >>>>>>>> )
>>> >>>>>>>> )
>>> >>>>>>>> FROM
>>> >>>>>>>> "users"
>>> >>>>>>>> LEFT OUTER JOIN
>>> >>>>>>>> "todos" ON "users"."id" = "todos"."user_id";
>>> >>>>>>>>
>>> >>>>>>>> Checking the source, seems this is a blanket policy, not a
>>> >>>>>>>> datasource-specific thing.
>>> >>>>>>>> From a functional perspective, it doesn't feel like it's much
>>> >>>>>> different
>>> >>>>>>>> from JOINs
>>> >>>>>>>> But I don't understand relational theory or DB functionality in
>>> the
>>> >>>>>> least,
>>> >>>>>>>> so I'm not fit to judge.
>>> >>>>>>>>
>>> >>>>>>>> Just curious why Calcite doesn't allow this
>>> >>>>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>
>>> >>
>>>
>>>