No dice still unfortunately =/ If it's any easier, I put a self-contained single class reproduction on Github: https://github.com/GavinRay97/calcite-nested-json-subquery-bug/blob/main/src/test/java/com/example/AppTest.java
Is it worth filing a JIRA ticket over you think? On Wed, Feb 23, 2022 at 3:45 AM Julian Hyde <[email protected]> wrote: > Try ‘value ((‘ in place of ‘value (‘. > > Julian > > > On Feb 21, 2022, at 9:33 AM, Gavin Ray <[email protected]> wrote: > > > > I hadn't thought about the fact that ORM's probably have to solve this > > problem as well > > That is a great suggestion, I will try to investigate some of the popular > > ORM codebases and see if there are any tricks they are using. > > > > I seem to maybe be getting a tiny bit closer by using subqueries like > > Julian suggested instead of operator calls > > But if I may ask what is probably a very stupid question: > > > > What might the error message > > "parse failed: Query expression encountered in illegal context > > (state=,code=0)" > > > > Mean in the below query? > > > > The reason why I am confused is because the query runs if I remove the > > innermost subquery ("todos") > > But the innermost subquery is a direct copy-paste of the subquery above > it, > > so I know it MUST be valid > > > > As usual, thank you so much for your help/guidance Stamatis. > > > > select > > "g0"."id" "id", > > "g0"."address" "address", > > ( > > select json_arrayagg(json_object( > > key 'id' value "g1"."id", > > key 'todos' value ( > > select json_arrayagg(json_object( > > key 'id' value "g2"."id", > > key 'description' value "g2"."description", > > )) > > from ( > > select * from "todos" > > where "g1"."id" = "user_id" > > order by "id" > > ) "g2" > > ) > > )) > > from ( > > select * from "users" > > where "g0"."id" = "house_id" > > order by "id" > > ) "g1" > > ) "users" > > from "houses" "g0" > > order by "g0"."id" > > > >> On Mon, Feb 21, 2022 at 8:07 AM Stamatis Zampetakis <[email protected]> > >> wrote: > >> > >> Hi Gavin, > >> > >> A few more comments in case they help to get you a bit further on your > >> work. > >> > >> The need to return the result as a single object is a common problem in > >> object relational mapping (ORM) frameworks/APIS (JPA, Datanucleus, > >> Hibernate, etc.). Apart from the suggestions so far maybe you could look > >> into these frameworks as well for more inspiration. > >> > >> Moreover your approach of decomposing the query into individual parts is > >> commonly known as the N+1 problem [1]. > >> > >> Lastly, keep in mind that you can introduce custom UDF, UDAF functions > if > >> you need more flexibility on reconstructing the final result. > >> > >> Best, > >> Stamatis > >> > >> [1] > >> > >> > https://stackoverflow.com/questions/97197/what-is-the-n1-selects-problem-in-orm-object-relational-mapping > >> > >>> On Sun, Feb 13, 2022 at 3:59 AM Gavin Ray <[email protected]> > wrote: > >>> > >>> Ah wait nevermind, got excited and spoke too soon. Looking at it more > >>> closely, that data isn't correct. > >>> At least it's in somewhat the right shape, ha! > >>> > >>>> On Sat, Feb 12, 2022 at 9:57 PM Gavin Ray <[email protected]> > wrote: > >>> > >>>> After ~5 hours, I think I may have made some progress =) > >>>> > >>>> I have this, which currently works. The problem is that the nested > >>> columns > >>>> don't have names on them. > >>>> Since I need to return a nested "Map<String, Object>", I have to > figure > >>>> out how to convert this query into a form that gives column names. > >>>> > >>>> But this is still great progress I think! > >>>> > >>>> SELECT > >>>> "todos".*, > >>>> ARRAY( > >>>> SELECT > >>>> "users".*, > >>>> ARRAY( > >>>> SELECT > >>>> "todos".* > >>>> FROM > >>>> "todos" > >>>> ) AS "todos" > >>>> FROM > >>>> "users" > >>>> ) AS "users" > >>>> FROM > >>>> "todos" > >>>> WHERE > >>>> "user_id" IN ( > >>>> SELECT > >>>> "user_id" > >>>> FROM > >>>> "users" > >>>> WHERE > >>>> "house_id" IN ( > >>>> SELECT > >>>> "id" > >>>> FROM > >>>> "houses" > >>>> ) > >>>> ); > >>>> > >>>> > >>>> > >>>> > >>> > >> > +----+---------+------------------------+------------------------------------------------------------------------------+ > >>>> | id | user_id | description | > >>>> | > >>>> > >>>> > >>> > >> > +----+---------+------------------------+------------------------------------------------------------------------------+ > >>>> | 1 | 1 | Take out the trash | [{1, John, 1, [{1, 1, Take > >> out > >>>> the trash}, {2, 1, Watch my favorite show}, { | > >>>> | 2 | 1 | Watch my favorite show | [{1, John, 1, [{1, 1, Take > >> out > >>>> the trash}, {2, 1, Watch my favorite show}, { | > >>>> | 3 | 1 | Charge my phone | [{1, John, 1, [{1, 1, Take > >> out > >>>> the trash}, {2, 1, Watch my favorite show}, { | > >>>> | 4 | 2 | Cook dinner | [{1, John, 1, [{1, 1, Take > >> out > >>>> the trash}, {2, 1, Watch my favorite show}, { | > >>>> | 5 | 2 | Read a book | [{1, John, 1, [{1, 1, Take > >> out > >>>> the trash}, {2, 1, Watch my favorite show}, { | > >>>> | 6 | 2 | Organize office | [{1, John, 1, [{1, 1, Take > >> out > >>>> the trash}, {2, 1, Watch my favorite show}, { | > >>>> | 7 | 3 | Walk the dog | [{1, John, 1, [{1, 1, Take > >> out > >>>> the trash}, {2, 1, Watch my favorite show}, { | > >>>> | 8 | 3 | Feed the cat | [{1, John, 1, [{1, 1, Take > >> out > >>>> the trash}, {2, 1, Watch my favorite show}, { | > >>>> > >>>> > >>> > >> > +----+---------+------------------------+------------------------------------------------------------------------------+ > >>>> > >>>> On Sat, Feb 12, 2022 at 4:13 PM Gavin Ray <[email protected]> > >> wrote: > >>>> > >>>>> Nevermind, this is a standard term not something Calcite-specific it > >>>>> seems! > >>>>> > >>>>> https://en.wikipedia.org/wiki/Correlated_subquery > >>>>> > >>>>> On Sat, Feb 12, 2022 at 3:46 PM Gavin Ray <[email protected]> > >>> wrote: > >>>>> > >>>>>> Forgive my ignorance/lack of experience > >>>>>> > >>>>>> I am somewhat familiar with the ARRAY() function, but not sure I > know > >>>>>> the term "correlated" > >>>>>> Searching the Calcite codebase for uses of "correlated" + "query", I > >>>>>> found: > >>>>>> > >>>>>> > >>>>>> > >>> > >> > https://github.com/apache/calcite/blob/1d4f1b394bfdba03c5538017e12ab2431b137ca9/core/src/test/java/org/apache/calcite/test/SqlToRelConverterTest.java#L1603-L1612 > >>>>>> > >>>>>> @Test void testCorrelatedSubQueryInJoin() { > >>>>>> final String sql = "select *\n" > >>>>>> + "from emp as e\n" > >>>>>> + "join dept as d using (deptno)\n" > >>>>>> + "where d.name = (\n" > >>>>>> + " select max(name)\n" > >>>>>> + " from dept as d2\n" > >>>>>> + " where d2.deptno = d.deptno)"; > >>>>>> sql(sql).withExpand(false).ok(); > >>>>>> } > >>>>>> > >>>>>> But I also see this, which says it is "uncorrelated" but seems very > >>>>>> similar? > >>>>>> > >>>>>> @Test void testInUncorrelatedSubQuery() { > >>>>>> final String sql = "select empno from emp where deptno in" > >>>>>> + " (select deptno from dept)"; > >>>>>> sql(sql).ok(); > >>>>>> } > >>>>>> > >>>>>> I wouldn't blame you for not answering such a basic question -- but > >>> what > >>>>>> exactly does "correlation" mean here? > >>>>>> > >>>>>> Thanks, as usual Julian > >>>>>> > >>>>>> > >>>>>> On Sat, Feb 12, 2022 at 3:08 PM Julian Hyde <[email protected] > > > >>>>>> wrote: > >>>>>> > >>>>>>> Correlated ARRAY sub-query? > >>>>>>> > >>>>>>>> On Feb 12, 2022, at 10:40 AM, Gavin Ray <[email protected]> > >>>>>>> wrote: > >>>>>>>> > >>>>>>>> Apologies for the delay in replying > >>>>>>>> > >>>>>>>> This makes things clear and seems obvious now that you point it > >> out. > >>>>>>>> Thank you, Justin and Julian =) > >>>>>>>> > >>>>>>>> Let me ask another question (if I may) that I am struggling to > >>> phrase > >>>>>>>> easily. > >>>>>>>> > >>>>>>>> So with GraphQL, you might have a query like: > >>>>>>>> - "Get houses" > >>>>>>>> - "For each house get the user that lives in the house > >>>>>>>> - "And for each user get their list of todos" > >>>>>>>> > >>>>>>>> The result has to come back such that it's a single object for > >> each > >>>>>>> row > >>>>>>>> ============== > >>>>>>>> { > >>>>>>>> houses: [{ > >>>>>>>> address: "123 Main Street", > >>>>>>>> users: [{ > >>>>>>>> name: "Joe", > >>>>>>>> todos: [{ > >>>>>>>> description: "Take out trash" > >>>>>>>> }] > >>>>>>>> }] > >>>>>>>> } > >>>>>>>> > >>>>>>>> From a SQL perspective, the logical equivalent would be something > >>>>>>> like: > >>>>>>>> ============== > >>>>>>>> SELECT > >>>>>>>> house.address, > >>>>>>>> (somehow nest users + double-nest todos under user) > >>>>>>>> FROM > >>>>>>>> house > >>>>>>>> JOIN > >>>>>>>> user ON user.house_id = house.id > >>>>>>>> todos ON todos.user_id = user.id > >>>>>>>> WHERE > >>>>>>>> house.id = 1 > >>>>>>>> > >>>>>>>> I'm not familiar enough with SQL to have figured out a way to make > >>>>>>> this > >>>>>>>> kind of > >>>>>>>> query using operators that are supported across most of the DB's > >>>>>>> Calcite has > >>>>>>>> adapters for. > >>>>>>>> > >>>>>>>> Currently what I have done instead, on a tip from Gopalakrishna > >>> Holla > >>>>>>> from > >>>>>>>> LinkedIn Coral team who has built GraphQL-on-Calcite, was to break > >>> up > >>>>>>> the > >>>>>>>> query > >>>>>>>> into individual parts and then do the join in-memory: > >>>>>>>> > >>>>>>>> SELECT ... FROM users; > >>>>>>>> SELECT ... FROM todos WHERE todos.user_id IN (ResultSet from prev > >>>>>>> response); > >>>>>>>> > >>>>>>>> However, the way I am doing this seems like it's probably very > >>>>>>> inefficient. > >>>>>>>> Because I do a series of nested loops to add the key to each > >> object > >>>>>>> in the > >>>>>>>> parent ResultSet row: > >>>>>>>> > >>>>>>>> > >>>>>>> > >>> > >> > https://github.com/GavinRay97/GraphQLCalcite/blob/648e0ac4f6810a3c360d13a03e6597c33406de4b/src/main/kotlin/TableDataFetcher.kt#L135-L153 > >>>>>>>> > >>>>>>>> Is there some better way of doing this? > >>>>>>>> I would be eternally grateful for any advice. > >>>>>>>> > >>>>>>>> > >>>>>>>> On Thu, Feb 10, 2022 at 3:27 PM Julian Hyde < > >> [email protected] > >>>> > >>>>>>> wrote: > >>>>>>>> > >>>>>>>>> Yes, if you want to do multiple layers of aggregation, use CTEs > >>>>>>> (WITH) or > >>>>>>>>> nested sub-queries. For example, the following is I believe valid > >>>>>>> standard > >>>>>>>>> SQL, and actually computes something useful: > >>>>>>>>> > >>>>>>>>> WITH q1 AS > >>>>>>>>> (SELECT deptno, job, AVG(sal) AS avg_sal > >>>>>>>>> FROM emp > >>>>>>>>> GROUP BY deptno, job) > >>>>>>>>> WITH q2 AS > >>>>>>>>> (SELECT deptno, AVG(avg_sal) AS avg_avg_sal > >>>>>>>>> FROM q1 > >>>>>>>>> GROUP BY deptno) > >>>>>>>>> SELECT AVG(avg_avg_sal) > >>>>>>>>> FROM q2 > >>>>>>>>> GROUP BY () > >>>>>>>>> > >>>>>>>>> (You can omit the “GROUP BY ()” line, but I think it makes things > >>>>>>> clearer.) > >>>>>>>>> > >>>>>>>>> Julian > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> On Feb 10, 2022, at 12:17 PM, Justin Swanhart < > >>> [email protected]> > >>>>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>> I wish you could unsend emails :) Answering my own question, > >> no, > >>>>>>> because > >>>>>>>>>> that would return three rows with the average :D > >>>>>>>>>> > >>>>>>>>>> On Thu, Feb 10, 2022 at 3:16 PM Justin Swanhart < > >>>>>>> [email protected]> > >>>>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>>> Just out of curiosity, is the second level aggregation using > >> AVG > >>>>>>> in a > >>>>>>>>>>> window context? It the frame is the whole table and it > >>> aggregates > >>>>>>> over > >>>>>>>>> it? > >>>>>>>>>>> > >>>>>>>>>>> On Thu, Feb 10, 2022 at 3:12 PM Justin Swanhart < > >>>>>>> [email protected]> > >>>>>>>>>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>>> That is really neat about Oracle. > >>>>>>>>>>>> > >>>>>>>>>>>> The alternative in general is to use a subquery: > >>>>>>>>>>>> SELECT avg(avg(sal)) FROM emp GROUP BY deptno; > >>>>>>>>>>>> becomes > >>>>>>>>>>>> select avg(the_avg) > >>>>>>>>>>>> from (select avg(sal) from emp group b deptno) an_alias; > >>>>>>>>>>>> > >>>>>>>>>>>> or > >>>>>>>>>>>> > >>>>>>>>>>>> with the_cte as (select avg(sal) x from emp group by deptno) > >>>>>>>>>>>> select avg(x) from the_cte; > >>>>>>>>>>>> > >>>>>>>>>>>> On Thu, Feb 10, 2022 at 3:03 PM Julian Hyde < > >>>>>>> [email protected]> > >>>>>>>>>>>> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>>> Some databases, e.g. Oracle, allow TWO levels of nesting: > >>>>>>>>>>>>> > >>>>>>>>>>>>> SELECT avg(sal) FROM emp GROUP BY deptno; > >>>>>>>>>>>>> > >>>>>>>>>>>>> AVG(SAL) > >>>>>>>>>>>>> ======== > >>>>>>>>>>>>> 1,566.67 > >>>>>>>>>>>>> 2,175.00 > >>>>>>>>>>>>> 2,916.65 > >>>>>>>>>>>>> > >>>>>>>>>>>>> SELECT avg(avg(sal)) FROM emp GROUP BY deptno; > >>>>>>>>>>>>> > >>>>>>>>>>>>> AVG(SUM(SAL)) > >>>>>>>>>>>>> ============= > >>>>>>>>>>>>> 9,675 > >>>>>>>>>>>>> > >>>>>>>>>>>>> The first level aggregates by department (returning 3 > >> records), > >>>>>>> and > >>>>>>>>> the > >>>>>>>>>>>>> second level computes the grand total (returning 1 record). > >> But > >>>>>>> that > >>>>>>>>> is an > >>>>>>>>>>>>> exceptional case. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Generally, any expression in the SELECT or HAVING clause of > >> an > >>>>>>>>> aggregate > >>>>>>>>>>>>> query is either ‘before’ or ‘after’ aggregation. Consider > >>>>>>>>>>>>> > >>>>>>>>>>>>> SELECT t.x + 1 AS a, 2 + SUM(t.y + 3) AS b > >>>>>>>>>>>>> FROM t > >>>>>>>>>>>>> GROUP BY t.x > >>>>>>>>>>>>> > >>>>>>>>>>>>> The expressions “t.y” and “t.y + 3” occur before aggregation; > >>>>>>> “t.x”, > >>>>>>>>>>>>> “t.x + 1”, “SUM(t.y + 3)” and “2 + SUM(t.y + 3)” occur after > >>>>>>>>> aggregation. > >>>>>>>>>>>>> SQL semantics rely heavily on this stratification. Allowing > >> an > >>>>>>> extra > >>>>>>>>> level > >>>>>>>>>>>>> of aggregation would mess it all up. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Julian > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>>> On Feb 10, 2022, at 9:45 AM, Justin Swanhart < > >>>>>>> [email protected]> > >>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> This is a SQL limitation. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> mysql> select sum(1); > >>>>>>>>>>>>>> +--------+ > >>>>>>>>>>>>>> | sum(1) | > >>>>>>>>>>>>>> +--------+ > >>>>>>>>>>>>>> | 1 | > >>>>>>>>>>>>>> +--------+ > >>>>>>>>>>>>>> 1 row in set (0.00 sec) > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> mysql> select sum(sum(1)); > >>>>>>>>>>>>>> ERROR 1111 (HY000): Invalid use of group function > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On Thu, Feb 10, 2022 at 12:39 PM Gavin Ray < > >>>>>>> [email protected]> > >>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Went to test this query out and found that it can't be > >>>>>>> performed: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> SELECT > >>>>>>>>>>>>>>> JSON_OBJECT( > >>>>>>>>>>>>>>> KEY 'users' > >>>>>>>>>>>>>>> VALUE JSON_ARRAYAGG( > >>>>>>>>>>>>>>> JSON_OBJECT( > >>>>>>>>>>>>>>> KEY 'name' VALUE "users"."name", > >>>>>>>>>>>>>>> KEY 'todos' VALUE JSON_ARRAYAGG( > >>>>>>>>>>>>>>> JSON_OBJECT( > >>>>>>>>>>>>>>> KEY 'description' VALUE > >>>>>>> "todos"."description" > >>>>>>>>>>>>>>> ) > >>>>>>>>>>>>>>> ) > >>>>>>>>>>>>>>> ) > >>>>>>>>>>>>>>> ) > >>>>>>>>>>>>>>> ) > >>>>>>>>>>>>>>> FROM > >>>>>>>>>>>>>>> "users" > >>>>>>>>>>>>>>> LEFT OUTER JOIN > >>>>>>>>>>>>>>> "todos" ON "users"."id" = "todos"."user_id"; > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Checking the source, seems this is a blanket policy, not a > >>>>>>>>>>>>>>> datasource-specific thing. > >>>>>>>>>>>>>>> From a functional perspective, it doesn't feel like it's > >> much > >>>>>>>>>>>>> different > >>>>>>>>>>>>>>> from JOINs > >>>>>>>>>>>>>>> But I don't understand relational theory or DB > >> functionality > >>>>>>> in the > >>>>>>>>>>>>> least, > >>>>>>>>>>>>>>> so I'm not fit to judge. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Just curious why Calcite doesn't allow this > >>>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>> > >>>>>>> > >>> > >> >
