That's a valid point, let me see check what some common dialects will accept for this sort of thing
On Wed, Feb 23, 2022 at 12:36 PM Julian Hyde <[email protected]> wrote: > It’s a parser error. That usually means that the user has made a mistake. > > Try your SQL on another DB with JSON support before you declare this a > Calcite bug. > > Julian > > > On Feb 23, 2022, at 09:22, Gavin Ray <[email protected]> wrote: > > > > No dice still unfortunately =/ > > > > If it's any easier, I put a self-contained single class reproduction on > > Github: > > > https://github.com/GavinRay97/calcite-nested-json-subquery-bug/blob/main/src/test/java/com/example/AppTest.java > > > > Is it worth filing a JIRA ticket over you think? > > > > > >> On Wed, Feb 23, 2022 at 3:45 AM Julian Hyde <[email protected]> > wrote: > >> > >> Try ‘value ((‘ in place of ‘value (‘. > >> > >> Julian > >> > >>>> On Feb 21, 2022, at 9:33 AM, Gavin Ray <[email protected]> wrote: > >>> > >>> I hadn't thought about the fact that ORM's probably have to solve this > >>> problem as well > >>> That is a great suggestion, I will try to investigate some of the > popular > >>> ORM codebases and see if there are any tricks they are using. > >>> > >>> I seem to maybe be getting a tiny bit closer by using subqueries like > >>> Julian suggested instead of operator calls > >>> But if I may ask what is probably a very stupid question: > >>> > >>> What might the error message > >>> "parse failed: Query expression encountered in illegal context > >>> (state=,code=0)" > >>> > >>> Mean in the below query? > >>> > >>> The reason why I am confused is because the query runs if I remove the > >>> innermost subquery ("todos") > >>> But the innermost subquery is a direct copy-paste of the subquery above > >> it, > >>> so I know it MUST be valid > >>> > >>> As usual, thank you so much for your help/guidance Stamatis. > >>> > >>> select > >>> "g0"."id" "id", > >>> "g0"."address" "address", > >>> ( > >>> select json_arrayagg(json_object( > >>> key 'id' value "g1"."id", > >>> key 'todos' value ( > >>> select json_arrayagg(json_object( > >>> key 'id' value "g2"."id", > >>> key 'description' value "g2"."description", > >>> )) > >>> from ( > >>> select * from "todos" > >>> where "g1"."id" = "user_id" > >>> order by "id" > >>> ) "g2" > >>> ) > >>> )) > >>> from ( > >>> select * from "users" > >>> where "g0"."id" = "house_id" > >>> order by "id" > >>> ) "g1" > >>> ) "users" > >>> from "houses" "g0" > >>> order by "g0"."id" > >>> > >>>> On Mon, Feb 21, 2022 at 8:07 AM Stamatis Zampetakis < > [email protected]> > >>>> wrote: > >>>> > >>>> Hi Gavin, > >>>> > >>>> A few more comments in case they help to get you a bit further on your > >>>> work. > >>>> > >>>> The need to return the result as a single object is a common problem > in > >>>> object relational mapping (ORM) frameworks/APIS (JPA, Datanucleus, > >>>> Hibernate, etc.). Apart from the suggestions so far maybe you could > look > >>>> into these frameworks as well for more inspiration. > >>>> > >>>> Moreover your approach of decomposing the query into individual parts > is > >>>> commonly known as the N+1 problem [1]. > >>>> > >>>> Lastly, keep in mind that you can introduce custom UDF, UDAF functions > >> if > >>>> you need more flexibility on reconstructing the final result. > >>>> > >>>> Best, > >>>> Stamatis > >>>> > >>>> [1] > >>>> > >>>> > >> > https://stackoverflow.com/questions/97197/what-is-the-n1-selects-problem-in-orm-object-relational-mapping > >>>> > >>>>> On Sun, Feb 13, 2022 at 3:59 AM Gavin Ray <[email protected]> > >> wrote: > >>>>> > >>>>> Ah wait nevermind, got excited and spoke too soon. Looking at it more > >>>>> closely, that data isn't correct. > >>>>> At least it's in somewhat the right shape, ha! > >>>>> > >>>>>> On Sat, Feb 12, 2022 at 9:57 PM Gavin Ray <[email protected]> > >> wrote: > >>>>> > >>>>>> After ~5 hours, I think I may have made some progress =) > >>>>>> > >>>>>> I have this, which currently works. The problem is that the nested > >>>>> columns > >>>>>> don't have names on them. > >>>>>> Since I need to return a nested "Map<String, Object>", I have to > >> figure > >>>>>> out how to convert this query into a form that gives column names. > >>>>>> > >>>>>> But this is still great progress I think! > >>>>>> > >>>>>> SELECT > >>>>>> "todos".*, > >>>>>> ARRAY( > >>>>>> SELECT > >>>>>> "users".*, > >>>>>> ARRAY( > >>>>>> SELECT > >>>>>> "todos".* > >>>>>> FROM > >>>>>> "todos" > >>>>>> ) AS "todos" > >>>>>> FROM > >>>>>> "users" > >>>>>> ) AS "users" > >>>>>> FROM > >>>>>> "todos" > >>>>>> WHERE > >>>>>> "user_id" IN ( > >>>>>> SELECT > >>>>>> "user_id" > >>>>>> FROM > >>>>>> "users" > >>>>>> WHERE > >>>>>> "house_id" IN ( > >>>>>> SELECT > >>>>>> "id" > >>>>>> FROM > >>>>>> "houses" > >>>>>> ) > >>>>>> ); > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>> > >>>> > >> > +----+---------+------------------------+------------------------------------------------------------------------------+ > >>>>>> | id | user_id | description | > >>>>>> | > >>>>>> > >>>>>> > >>>>> > >>>> > >> > +----+---------+------------------------+------------------------------------------------------------------------------+ > >>>>>> | 1 | 1 | Take out the trash | [{1, John, 1, [{1, 1, Take > >>>> out > >>>>>> the trash}, {2, 1, Watch my favorite show}, { | > >>>>>> | 2 | 1 | Watch my favorite show | [{1, John, 1, [{1, 1, Take > >>>> out > >>>>>> the trash}, {2, 1, Watch my favorite show}, { | > >>>>>> | 3 | 1 | Charge my phone | [{1, John, 1, [{1, 1, Take > >>>> out > >>>>>> the trash}, {2, 1, Watch my favorite show}, { | > >>>>>> | 4 | 2 | Cook dinner | [{1, John, 1, [{1, 1, Take > >>>> out > >>>>>> the trash}, {2, 1, Watch my favorite show}, { | > >>>>>> | 5 | 2 | Read a book | [{1, John, 1, [{1, 1, Take > >>>> out > >>>>>> the trash}, {2, 1, Watch my favorite show}, { | > >>>>>> | 6 | 2 | Organize office | [{1, John, 1, [{1, 1, Take > >>>> out > >>>>>> the trash}, {2, 1, Watch my favorite show}, { | > >>>>>> | 7 | 3 | Walk the dog | [{1, John, 1, [{1, 1, Take > >>>> out > >>>>>> the trash}, {2, 1, Watch my favorite show}, { | > >>>>>> | 8 | 3 | Feed the cat | [{1, John, 1, [{1, 1, Take > >>>> out > >>>>>> the trash}, {2, 1, Watch my favorite show}, { | > >>>>>> > >>>>>> > >>>>> > >>>> > >> > +----+---------+------------------------+------------------------------------------------------------------------------+ > >>>>>> > >>>>>> On Sat, Feb 12, 2022 at 4:13 PM Gavin Ray <[email protected]> > >>>> wrote: > >>>>>> > >>>>>>> Nevermind, this is a standard term not something Calcite-specific > it > >>>>>>> seems! > >>>>>>> > >>>>>>> https://en.wikipedia.org/wiki/Correlated_subquery > >>>>>>> > >>>>>>> On Sat, Feb 12, 2022 at 3:46 PM Gavin Ray <[email protected]> > >>>>> wrote: > >>>>>>> > >>>>>>>> Forgive my ignorance/lack of experience > >>>>>>>> > >>>>>>>> I am somewhat familiar with the ARRAY() function, but not sure I > >> know > >>>>>>>> the term "correlated" > >>>>>>>> Searching the Calcite codebase for uses of "correlated" + > "query", I > >>>>>>>> found: > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>> > >>>> > >> > https://github.com/apache/calcite/blob/1d4f1b394bfdba03c5538017e12ab2431b137ca9/core/src/test/java/org/apache/calcite/test/SqlToRelConverterTest.java#L1603-L1612 > >>>>>>>> > >>>>>>>> @Test void testCorrelatedSubQueryInJoin() { > >>>>>>>> final String sql = "select *\n" > >>>>>>>> + "from emp as e\n" > >>>>>>>> + "join dept as d using (deptno)\n" > >>>>>>>> + "where d.name = (\n" > >>>>>>>> + " select max(name)\n" > >>>>>>>> + " from dept as d2\n" > >>>>>>>> + " where d2.deptno = d.deptno)"; > >>>>>>>> sql(sql).withExpand(false).ok(); > >>>>>>>> } > >>>>>>>> > >>>>>>>> But I also see this, which says it is "uncorrelated" but seems > very > >>>>>>>> similar? > >>>>>>>> > >>>>>>>> @Test void testInUncorrelatedSubQuery() { > >>>>>>>> final String sql = "select empno from emp where deptno in" > >>>>>>>> + " (select deptno from dept)"; > >>>>>>>> sql(sql).ok(); > >>>>>>>> } > >>>>>>>> > >>>>>>>> I wouldn't blame you for not answering such a basic question -- > but > >>>>> what > >>>>>>>> exactly does "correlation" mean here? > >>>>>>>> > >>>>>>>> Thanks, as usual Julian > >>>>>>>> > >>>>>>>> > >>>>>>>> On Sat, Feb 12, 2022 at 3:08 PM Julian Hyde < > [email protected] > >>> > >>>>>>>> wrote: > >>>>>>>> > >>>>>>>>> Correlated ARRAY sub-query? > >>>>>>>>> > >>>>>>>>>> On Feb 12, 2022, at 10:40 AM, Gavin Ray <[email protected]> > >>>>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>> Apologies for the delay in replying > >>>>>>>>>> > >>>>>>>>>> This makes things clear and seems obvious now that you point it > >>>> out. > >>>>>>>>>> Thank you, Justin and Julian =) > >>>>>>>>>> > >>>>>>>>>> Let me ask another question (if I may) that I am struggling to > >>>>> phrase > >>>>>>>>>> easily. > >>>>>>>>>> > >>>>>>>>>> So with GraphQL, you might have a query like: > >>>>>>>>>> - "Get houses" > >>>>>>>>>> - "For each house get the user that lives in the house > >>>>>>>>>> - "And for each user get their list of todos" > >>>>>>>>>> > >>>>>>>>>> The result has to come back such that it's a single object for > >>>> each > >>>>>>>>> row > >>>>>>>>>> ============== > >>>>>>>>>> { > >>>>>>>>>> houses: [{ > >>>>>>>>>> address: "123 Main Street", > >>>>>>>>>> users: [{ > >>>>>>>>>> name: "Joe", > >>>>>>>>>> todos: [{ > >>>>>>>>>> description: "Take out trash" > >>>>>>>>>> }] > >>>>>>>>>> }] > >>>>>>>>>> } > >>>>>>>>>> > >>>>>>>>>> From a SQL perspective, the logical equivalent would be > something > >>>>>>>>> like: > >>>>>>>>>> ============== > >>>>>>>>>> SELECT > >>>>>>>>>> house.address, > >>>>>>>>>> (somehow nest users + double-nest todos under user) > >>>>>>>>>> FROM > >>>>>>>>>> house > >>>>>>>>>> JOIN > >>>>>>>>>> user ON user.house_id = house.id > >>>>>>>>>> todos ON todos.user_id = user.id > >>>>>>>>>> WHERE > >>>>>>>>>> house.id = 1 > >>>>>>>>>> > >>>>>>>>>> I'm not familiar enough with SQL to have figured out a way to > make > >>>>>>>>> this > >>>>>>>>>> kind of > >>>>>>>>>> query using operators that are supported across most of the DB's > >>>>>>>>> Calcite has > >>>>>>>>>> adapters for. > >>>>>>>>>> > >>>>>>>>>> Currently what I have done instead, on a tip from Gopalakrishna > >>>>> Holla > >>>>>>>>> from > >>>>>>>>>> LinkedIn Coral team who has built GraphQL-on-Calcite, was to > break > >>>>> up > >>>>>>>>> the > >>>>>>>>>> query > >>>>>>>>>> into individual parts and then do the join in-memory: > >>>>>>>>>> > >>>>>>>>>> SELECT ... FROM users; > >>>>>>>>>> SELECT ... FROM todos WHERE todos.user_id IN (ResultSet from > prev > >>>>>>>>> response); > >>>>>>>>>> > >>>>>>>>>> However, the way I am doing this seems like it's probably very > >>>>>>>>> inefficient. > >>>>>>>>>> Because I do a series of nested loops to add the key to each > >>>> object > >>>>>>>>> in the > >>>>>>>>>> parent ResultSet row: > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>> > >>>> > >> > https://github.com/GavinRay97/GraphQLCalcite/blob/648e0ac4f6810a3c360d13a03e6597c33406de4b/src/main/kotlin/TableDataFetcher.kt#L135-L153 > >>>>>>>>>> > >>>>>>>>>> Is there some better way of doing this? > >>>>>>>>>> I would be eternally grateful for any advice. > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> On Thu, Feb 10, 2022 at 3:27 PM Julian Hyde < > >>>> [email protected] > >>>>>> > >>>>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>>> Yes, if you want to do multiple layers of aggregation, use CTEs > >>>>>>>>> (WITH) or > >>>>>>>>>>> nested sub-queries. For example, the following is I believe > valid > >>>>>>>>> standard > >>>>>>>>>>> SQL, and actually computes something useful: > >>>>>>>>>>> > >>>>>>>>>>> WITH q1 AS > >>>>>>>>>>> (SELECT deptno, job, AVG(sal) AS avg_sal > >>>>>>>>>>> FROM emp > >>>>>>>>>>> GROUP BY deptno, job) > >>>>>>>>>>> WITH q2 AS > >>>>>>>>>>> (SELECT deptno, AVG(avg_sal) AS avg_avg_sal > >>>>>>>>>>> FROM q1 > >>>>>>>>>>> GROUP BY deptno) > >>>>>>>>>>> SELECT AVG(avg_avg_sal) > >>>>>>>>>>> FROM q2 > >>>>>>>>>>> GROUP BY () > >>>>>>>>>>> > >>>>>>>>>>> (You can omit the “GROUP BY ()” line, but I think it makes > things > >>>>>>>>> clearer.) > >>>>>>>>>>> > >>>>>>>>>>> Julian > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>> On Feb 10, 2022, at 12:17 PM, Justin Swanhart < > >>>>> [email protected]> > >>>>>>>>>>> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>> I wish you could unsend emails :) Answering my own question, > >>>> no, > >>>>>>>>> because > >>>>>>>>>>>> that would return three rows with the average :D > >>>>>>>>>>>> > >>>>>>>>>>>> On Thu, Feb 10, 2022 at 3:16 PM Justin Swanhart < > >>>>>>>>> [email protected]> > >>>>>>>>>>> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>>> Just out of curiosity, is the second level aggregation using > >>>> AVG > >>>>>>>>> in a > >>>>>>>>>>>>> window context? It the frame is the whole table and it > >>>>> aggregates > >>>>>>>>> over > >>>>>>>>>>> it? > >>>>>>>>>>>>> > >>>>>>>>>>>>> On Thu, Feb 10, 2022 at 3:12 PM Justin Swanhart < > >>>>>>>>> [email protected]> > >>>>>>>>>>>>> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>>> That is really neat about Oracle. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> The alternative in general is to use a subquery: > >>>>>>>>>>>>>> SELECT avg(avg(sal)) FROM emp GROUP BY deptno; > >>>>>>>>>>>>>> becomes > >>>>>>>>>>>>>> select avg(the_avg) > >>>>>>>>>>>>>> from (select avg(sal) from emp group b deptno) an_alias; > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> or > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> with the_cte as (select avg(sal) x from emp group by deptno) > >>>>>>>>>>>>>> select avg(x) from the_cte; > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On Thu, Feb 10, 2022 at 3:03 PM Julian Hyde < > >>>>>>>>> [email protected]> > >>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Some databases, e.g. Oracle, allow TWO levels of nesting: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> SELECT avg(sal) FROM emp GROUP BY deptno; > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> AVG(SAL) > >>>>>>>>>>>>>>> ======== > >>>>>>>>>>>>>>> 1,566.67 > >>>>>>>>>>>>>>> 2,175.00 > >>>>>>>>>>>>>>> 2,916.65 > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> SELECT avg(avg(sal)) FROM emp GROUP BY deptno; > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> AVG(SUM(SAL)) > >>>>>>>>>>>>>>> ============= > >>>>>>>>>>>>>>> 9,675 > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> The first level aggregates by department (returning 3 > >>>> records), > >>>>>>>>> and > >>>>>>>>>>> the > >>>>>>>>>>>>>>> second level computes the grand total (returning 1 record). > >>>> But > >>>>>>>>> that > >>>>>>>>>>> is an > >>>>>>>>>>>>>>> exceptional case. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Generally, any expression in the SELECT or HAVING clause of > >>>> an > >>>>>>>>>>> aggregate > >>>>>>>>>>>>>>> query is either ‘before’ or ‘after’ aggregation. Consider > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> SELECT t.x + 1 AS a, 2 + SUM(t.y + 3) AS b > >>>>>>>>>>>>>>> FROM t > >>>>>>>>>>>>>>> GROUP BY t.x > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> The expressions “t.y” and “t.y + 3” occur before > aggregation; > >>>>>>>>> “t.x”, > >>>>>>>>>>>>>>> “t.x + 1”, “SUM(t.y + 3)” and “2 + SUM(t.y + 3)” occur > after > >>>>>>>>>>> aggregation. > >>>>>>>>>>>>>>> SQL semantics rely heavily on this stratification. Allowing > >>>> an > >>>>>>>>> extra > >>>>>>>>>>> level > >>>>>>>>>>>>>>> of aggregation would mess it all up. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Julian > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> On Feb 10, 2022, at 9:45 AM, Justin Swanhart < > >>>>>>>>> [email protected]> > >>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> This is a SQL limitation. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> mysql> select sum(1); > >>>>>>>>>>>>>>>> +--------+ > >>>>>>>>>>>>>>>> | sum(1) | > >>>>>>>>>>>>>>>> +--------+ > >>>>>>>>>>>>>>>> | 1 | > >>>>>>>>>>>>>>>> +--------+ > >>>>>>>>>>>>>>>> 1 row in set (0.00 sec) > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> mysql> select sum(sum(1)); > >>>>>>>>>>>>>>>> ERROR 1111 (HY000): Invalid use of group function > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> On Thu, Feb 10, 2022 at 12:39 PM Gavin Ray < > >>>>>>>>> [email protected]> > >>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Went to test this query out and found that it can't be > >>>>>>>>> performed: > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> SELECT > >>>>>>>>>>>>>>>>> JSON_OBJECT( > >>>>>>>>>>>>>>>>> KEY 'users' > >>>>>>>>>>>>>>>>> VALUE JSON_ARRAYAGG( > >>>>>>>>>>>>>>>>> JSON_OBJECT( > >>>>>>>>>>>>>>>>> KEY 'name' VALUE "users"."name", > >>>>>>>>>>>>>>>>> KEY 'todos' VALUE JSON_ARRAYAGG( > >>>>>>>>>>>>>>>>> JSON_OBJECT( > >>>>>>>>>>>>>>>>> KEY 'description' VALUE > >>>>>>>>> "todos"."description" > >>>>>>>>>>>>>>>>> ) > >>>>>>>>>>>>>>>>> ) > >>>>>>>>>>>>>>>>> ) > >>>>>>>>>>>>>>>>> ) > >>>>>>>>>>>>>>>>> ) > >>>>>>>>>>>>>>>>> FROM > >>>>>>>>>>>>>>>>> "users" > >>>>>>>>>>>>>>>>> LEFT OUTER JOIN > >>>>>>>>>>>>>>>>> "todos" ON "users"."id" = "todos"."user_id"; > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Checking the source, seems this is a blanket policy, not > a > >>>>>>>>>>>>>>>>> datasource-specific thing. > >>>>>>>>>>>>>>>>> From a functional perspective, it doesn't feel like it's > >>>> much > >>>>>>>>>>>>>>> different > >>>>>>>>>>>>>>>>> from JOINs > >>>>>>>>>>>>>>>>> But I don't understand relational theory or DB > >>>> functionality > >>>>>>>>> in the > >>>>>>>>>>>>>>> least, > >>>>>>>>>>>>>>>>> so I'm not fit to judge. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Just curious why Calcite doesn't allow this > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>> > >>>> > >> >
