I hadn't thought about the fact that ORM's probably have to solve this
problem as well
That is a great suggestion, I will try to investigate some of the popular
ORM codebases and see if there are any tricks they are using.
I seem to maybe be getting a tiny bit closer by using subqueries like
Julian suggested instead of operator calls
But if I may ask what is probably a very stupid question:
What might the error message
"parse failed: Query expression encountered in illegal context
(state=,code=0)"
Mean in the below query?
The reason why I am confused is because the query runs if I remove the
innermost subquery ("todos")
But the innermost subquery is a direct copy-paste of the subquery above it,
so I know it MUST be valid
As usual, thank you so much for your help/guidance Stamatis.
select
"g0"."id" "id",
"g0"."address" "address",
(
select json_arrayagg(json_object(
key 'id' value "g1"."id",
key 'todos' value (
select json_arrayagg(json_object(
key 'id' value "g2"."id",
key 'description' value "g2"."description",
))
from (
select * from "todos"
where "g1"."id" = "user_id"
order by "id"
) "g2"
)
))
from (
select * from "users"
where "g0"."id" = "house_id"
order by "id"
) "g1"
) "users"
from "houses" "g0"
order by "g0"."id"
On Mon, Feb 21, 2022 at 8:07 AM Stamatis Zampetakis <[email protected]>
wrote:
> Hi Gavin,
>
> A few more comments in case they help to get you a bit further on your
> work.
>
> The need to return the result as a single object is a common problem in
> object relational mapping (ORM) frameworks/APIS (JPA, Datanucleus,
> Hibernate, etc.). Apart from the suggestions so far maybe you could look
> into these frameworks as well for more inspiration.
>
> Moreover your approach of decomposing the query into individual parts is
> commonly known as the N+1 problem [1].
>
> Lastly, keep in mind that you can introduce custom UDF, UDAF functions if
> you need more flexibility on reconstructing the final result.
>
> Best,
> Stamatis
>
> [1]
>
> https://stackoverflow.com/questions/97197/what-is-the-n1-selects-problem-in-orm-object-relational-mapping
>
> On Sun, Feb 13, 2022 at 3:59 AM Gavin Ray <[email protected]> wrote:
>
> > Ah wait nevermind, got excited and spoke too soon. Looking at it more
> > closely, that data isn't correct.
> > At least it's in somewhat the right shape, ha!
> >
> > On Sat, Feb 12, 2022 at 9:57 PM Gavin Ray <[email protected]> wrote:
> >
> > > After ~5 hours, I think I may have made some progress =)
> > >
> > > I have this, which currently works. The problem is that the nested
> > columns
> > > don't have names on them.
> > > Since I need to return a nested "Map<String, Object>", I have to figure
> > > out how to convert this query into a form that gives column names.
> > >
> > > But this is still great progress I think!
> > >
> > > SELECT
> > > "todos".*,
> > > ARRAY(
> > > SELECT
> > > "users".*,
> > > ARRAY(
> > > SELECT
> > > "todos".*
> > > FROM
> > > "todos"
> > > ) AS "todos"
> > > FROM
> > > "users"
> > > ) AS "users"
> > > FROM
> > > "todos"
> > > WHERE
> > > "user_id" IN (
> > > SELECT
> > > "user_id"
> > > FROM
> > > "users"
> > > WHERE
> > > "house_id" IN (
> > > SELECT
> > > "id"
> > > FROM
> > > "houses"
> > > )
> > > );
> > >
> > >
> > >
> > >
> >
> +----+---------+------------------------+------------------------------------------------------------------------------+
> > > | id | user_id | description |
> > > |
> > >
> > >
> >
> +----+---------+------------------------+------------------------------------------------------------------------------+
> > > | 1 | 1 | Take out the trash | [{1, John, 1, [{1, 1, Take
> out
> > > the trash}, {2, 1, Watch my favorite show}, { |
> > > | 2 | 1 | Watch my favorite show | [{1, John, 1, [{1, 1, Take
> out
> > > the trash}, {2, 1, Watch my favorite show}, { |
> > > | 3 | 1 | Charge my phone | [{1, John, 1, [{1, 1, Take
> out
> > > the trash}, {2, 1, Watch my favorite show}, { |
> > > | 4 | 2 | Cook dinner | [{1, John, 1, [{1, 1, Take
> out
> > > the trash}, {2, 1, Watch my favorite show}, { |
> > > | 5 | 2 | Read a book | [{1, John, 1, [{1, 1, Take
> out
> > > the trash}, {2, 1, Watch my favorite show}, { |
> > > | 6 | 2 | Organize office | [{1, John, 1, [{1, 1, Take
> out
> > > the trash}, {2, 1, Watch my favorite show}, { |
> > > | 7 | 3 | Walk the dog | [{1, John, 1, [{1, 1, Take
> out
> > > the trash}, {2, 1, Watch my favorite show}, { |
> > > | 8 | 3 | Feed the cat | [{1, John, 1, [{1, 1, Take
> out
> > > the trash}, {2, 1, Watch my favorite show}, { |
> > >
> > >
> >
> +----+---------+------------------------+------------------------------------------------------------------------------+
> > >
> > > On Sat, Feb 12, 2022 at 4:13 PM Gavin Ray <[email protected]>
> wrote:
> > >
> > >> Nevermind, this is a standard term not something Calcite-specific it
> > >> seems!
> > >>
> > >> https://en.wikipedia.org/wiki/Correlated_subquery
> > >>
> > >> On Sat, Feb 12, 2022 at 3:46 PM Gavin Ray <[email protected]>
> > wrote:
> > >>
> > >>> Forgive my ignorance/lack of experience
> > >>>
> > >>> I am somewhat familiar with the ARRAY() function, but not sure I know
> > >>> the term "correlated"
> > >>> Searching the Calcite codebase for uses of "correlated" + "query", I
> > >>> found:
> > >>>
> > >>>
> > >>>
> >
> https://github.com/apache/calcite/blob/1d4f1b394bfdba03c5538017e12ab2431b137ca9/core/src/test/java/org/apache/calcite/test/SqlToRelConverterTest.java#L1603-L1612
> > >>>
> > >>> @Test void testCorrelatedSubQueryInJoin() {
> > >>> final String sql = "select *\n"
> > >>> + "from emp as e\n"
> > >>> + "join dept as d using (deptno)\n"
> > >>> + "where d.name = (\n"
> > >>> + " select max(name)\n"
> > >>> + " from dept as d2\n"
> > >>> + " where d2.deptno = d.deptno)";
> > >>> sql(sql).withExpand(false).ok();
> > >>> }
> > >>>
> > >>> But I also see this, which says it is "uncorrelated" but seems very
> > >>> similar?
> > >>>
> > >>> @Test void testInUncorrelatedSubQuery() {
> > >>> final String sql = "select empno from emp where deptno in"
> > >>> + " (select deptno from dept)";
> > >>> sql(sql).ok();
> > >>> }
> > >>>
> > >>> I wouldn't blame you for not answering such a basic question -- but
> > what
> > >>> exactly does "correlation" mean here?
> > >>>
> > >>> Thanks, as usual Julian
> > >>>
> > >>>
> > >>> On Sat, Feb 12, 2022 at 3:08 PM Julian Hyde <[email protected]>
> > >>> wrote:
> > >>>
> > >>>> Correlated ARRAY sub-query?
> > >>>>
> > >>>> > On Feb 12, 2022, at 10:40 AM, Gavin Ray <[email protected]>
> > >>>> wrote:
> > >>>> >
> > >>>> > Apologies for the delay in replying
> > >>>> >
> > >>>> > This makes things clear and seems obvious now that you point it
> out.
> > >>>> > Thank you, Justin and Julian =)
> > >>>> >
> > >>>> > Let me ask another question (if I may) that I am struggling to
> > phrase
> > >>>> > easily.
> > >>>> >
> > >>>> > So with GraphQL, you might have a query like:
> > >>>> > - "Get houses"
> > >>>> > - "For each house get the user that lives in the house
> > >>>> > - "And for each user get their list of todos"
> > >>>> >
> > >>>> > The result has to come back such that it's a single object for
> each
> > >>>> row
> > >>>> > ==============
> > >>>> > {
> > >>>> > houses: [{
> > >>>> > address: "123 Main Street",
> > >>>> > users: [{
> > >>>> > name: "Joe",
> > >>>> > todos: [{
> > >>>> > description: "Take out trash"
> > >>>> > }]
> > >>>> > }]
> > >>>> > }
> > >>>> >
> > >>>> > From a SQL perspective, the logical equivalent would be something
> > >>>> like:
> > >>>> > ==============
> > >>>> > SELECT
> > >>>> > house.address,
> > >>>> > (somehow nest users + double-nest todos under user)
> > >>>> > FROM
> > >>>> > house
> > >>>> > JOIN
> > >>>> > user ON user.house_id = house.id
> > >>>> > todos ON todos.user_id = user.id
> > >>>> > WHERE
> > >>>> > house.id = 1
> > >>>> >
> > >>>> > I'm not familiar enough with SQL to have figured out a way to make
> > >>>> this
> > >>>> > kind of
> > >>>> > query using operators that are supported across most of the DB's
> > >>>> Calcite has
> > >>>> > adapters for.
> > >>>> >
> > >>>> > Currently what I have done instead, on a tip from Gopalakrishna
> > Holla
> > >>>> from
> > >>>> > LinkedIn Coral team who has built GraphQL-on-Calcite, was to break
> > up
> > >>>> the
> > >>>> > query
> > >>>> > into individual parts and then do the join in-memory:
> > >>>> >
> > >>>> > SELECT ... FROM users;
> > >>>> > SELECT ... FROM todos WHERE todos.user_id IN (ResultSet from prev
> > >>>> response);
> > >>>> >
> > >>>> > However, the way I am doing this seems like it's probably very
> > >>>> inefficient.
> > >>>> > Because I do a series of nested loops to add the key to each
> object
> > >>>> in the
> > >>>> > parent ResultSet row:
> > >>>> >
> > >>>> >
> > >>>>
> >
> https://github.com/GavinRay97/GraphQLCalcite/blob/648e0ac4f6810a3c360d13a03e6597c33406de4b/src/main/kotlin/TableDataFetcher.kt#L135-L153
> > >>>> >
> > >>>> > Is there some better way of doing this?
> > >>>> > I would be eternally grateful for any advice.
> > >>>> >
> > >>>> >
> > >>>> > On Thu, Feb 10, 2022 at 3:27 PM Julian Hyde <
> [email protected]
> > >
> > >>>> wrote:
> > >>>> >
> > >>>> >> Yes, if you want to do multiple layers of aggregation, use CTEs
> > >>>> (WITH) or
> > >>>> >> nested sub-queries. For example, the following is I believe valid
> > >>>> standard
> > >>>> >> SQL, and actually computes something useful:
> > >>>> >>
> > >>>> >> WITH q1 AS
> > >>>> >> (SELECT deptno, job, AVG(sal) AS avg_sal
> > >>>> >> FROM emp
> > >>>> >> GROUP BY deptno, job)
> > >>>> >> WITH q2 AS
> > >>>> >> (SELECT deptno, AVG(avg_sal) AS avg_avg_sal
> > >>>> >> FROM q1
> > >>>> >> GROUP BY deptno)
> > >>>> >> SELECT AVG(avg_avg_sal)
> > >>>> >> FROM q2
> > >>>> >> GROUP BY ()
> > >>>> >>
> > >>>> >> (You can omit the “GROUP BY ()” line, but I think it makes things
> > >>>> clearer.)
> > >>>> >>
> > >>>> >> Julian
> > >>>> >>
> > >>>> >>
> > >>>> >>
> > >>>> >>> On Feb 10, 2022, at 12:17 PM, Justin Swanhart <
> > [email protected]>
> > >>>> >> wrote:
> > >>>> >>>
> > >>>> >>> I wish you could unsend emails :) Answering my own question,
> no,
> > >>>> because
> > >>>> >>> that would return three rows with the average :D
> > >>>> >>>
> > >>>> >>> On Thu, Feb 10, 2022 at 3:16 PM Justin Swanhart <
> > >>>> [email protected]>
> > >>>> >> wrote:
> > >>>> >>>
> > >>>> >>>> Just out of curiosity, is the second level aggregation using
> AVG
> > >>>> in a
> > >>>> >>>> window context? It the frame is the whole table and it
> > aggregates
> > >>>> over
> > >>>> >> it?
> > >>>> >>>>
> > >>>> >>>> On Thu, Feb 10, 2022 at 3:12 PM Justin Swanhart <
> > >>>> [email protected]>
> > >>>> >>>> wrote:
> > >>>> >>>>
> > >>>> >>>>> That is really neat about Oracle.
> > >>>> >>>>>
> > >>>> >>>>> The alternative in general is to use a subquery:
> > >>>> >>>>> SELECT avg(avg(sal)) FROM emp GROUP BY deptno;
> > >>>> >>>>> becomes
> > >>>> >>>>> select avg(the_avg)
> > >>>> >>>>> from (select avg(sal) from emp group b deptno) an_alias;
> > >>>> >>>>>
> > >>>> >>>>> or
> > >>>> >>>>>
> > >>>> >>>>> with the_cte as (select avg(sal) x from emp group by deptno)
> > >>>> >>>>> select avg(x) from the_cte;
> > >>>> >>>>>
> > >>>> >>>>> On Thu, Feb 10, 2022 at 3:03 PM Julian Hyde <
> > >>>> [email protected]>
> > >>>> >>>>> wrote:
> > >>>> >>>>>
> > >>>> >>>>>> Some databases, e.g. Oracle, allow TWO levels of nesting:
> > >>>> >>>>>>
> > >>>> >>>>>> SELECT avg(sal) FROM emp GROUP BY deptno;
> > >>>> >>>>>>
> > >>>> >>>>>> AVG(SAL)
> > >>>> >>>>>> ========
> > >>>> >>>>>> 1,566.67
> > >>>> >>>>>> 2,175.00
> > >>>> >>>>>> 2,916.65
> > >>>> >>>>>>
> > >>>> >>>>>> SELECT avg(avg(sal)) FROM emp GROUP BY deptno;
> > >>>> >>>>>>
> > >>>> >>>>>> AVG(SUM(SAL))
> > >>>> >>>>>> =============
> > >>>> >>>>>> 9,675
> > >>>> >>>>>>
> > >>>> >>>>>> The first level aggregates by department (returning 3
> records),
> > >>>> and
> > >>>> >> the
> > >>>> >>>>>> second level computes the grand total (returning 1 record).
> But
> > >>>> that
> > >>>> >> is an
> > >>>> >>>>>> exceptional case.
> > >>>> >>>>>>
> > >>>> >>>>>> Generally, any expression in the SELECT or HAVING clause of
> an
> > >>>> >> aggregate
> > >>>> >>>>>> query is either ‘before’ or ‘after’ aggregation. Consider
> > >>>> >>>>>>
> > >>>> >>>>>> SELECT t.x + 1 AS a, 2 + SUM(t.y + 3) AS b
> > >>>> >>>>>> FROM t
> > >>>> >>>>>> GROUP BY t.x
> > >>>> >>>>>>
> > >>>> >>>>>> The expressions “t.y” and “t.y + 3” occur before aggregation;
> > >>>> “t.x”,
> > >>>> >>>>>> “t.x + 1”, “SUM(t.y + 3)” and “2 + SUM(t.y + 3)” occur after
> > >>>> >> aggregation.
> > >>>> >>>>>> SQL semantics rely heavily on this stratification. Allowing
> an
> > >>>> extra
> > >>>> >> level
> > >>>> >>>>>> of aggregation would mess it all up.
> > >>>> >>>>>>
> > >>>> >>>>>> Julian
> > >>>> >>>>>>
> > >>>> >>>>>>
> > >>>> >>>>>>
> > >>>> >>>>>>> On Feb 10, 2022, at 9:45 AM, Justin Swanhart <
> > >>>> [email protected]>
> > >>>> >>>>>> wrote:
> > >>>> >>>>>>>
> > >>>> >>>>>>> This is a SQL limitation.
> > >>>> >>>>>>>
> > >>>> >>>>>>> mysql> select sum(1);
> > >>>> >>>>>>> +--------+
> > >>>> >>>>>>> | sum(1) |
> > >>>> >>>>>>> +--------+
> > >>>> >>>>>>> | 1 |
> > >>>> >>>>>>> +--------+
> > >>>> >>>>>>> 1 row in set (0.00 sec)
> > >>>> >>>>>>>
> > >>>> >>>>>>> mysql> select sum(sum(1));
> > >>>> >>>>>>> ERROR 1111 (HY000): Invalid use of group function
> > >>>> >>>>>>>
> > >>>> >>>>>>> On Thu, Feb 10, 2022 at 12:39 PM Gavin Ray <
> > >>>> [email protected]>
> > >>>> >>>>>> wrote:
> > >>>> >>>>>>>
> > >>>> >>>>>>>> Went to test this query out and found that it can't be
> > >>>> performed:
> > >>>> >>>>>>>>
> > >>>> >>>>>>>> SELECT
> > >>>> >>>>>>>> JSON_OBJECT(
> > >>>> >>>>>>>> KEY 'users'
> > >>>> >>>>>>>> VALUE JSON_ARRAYAGG(
> > >>>> >>>>>>>> JSON_OBJECT(
> > >>>> >>>>>>>> KEY 'name' VALUE "users"."name",
> > >>>> >>>>>>>> KEY 'todos' VALUE JSON_ARRAYAGG(
> > >>>> >>>>>>>> JSON_OBJECT(
> > >>>> >>>>>>>> KEY 'description' VALUE
> > >>>> "todos"."description"
> > >>>> >>>>>>>> )
> > >>>> >>>>>>>> )
> > >>>> >>>>>>>> )
> > >>>> >>>>>>>> )
> > >>>> >>>>>>>> )
> > >>>> >>>>>>>> FROM
> > >>>> >>>>>>>> "users"
> > >>>> >>>>>>>> LEFT OUTER JOIN
> > >>>> >>>>>>>> "todos" ON "users"."id" = "todos"."user_id";
> > >>>> >>>>>>>>
> > >>>> >>>>>>>> Checking the source, seems this is a blanket policy, not a
> > >>>> >>>>>>>> datasource-specific thing.
> > >>>> >>>>>>>> From a functional perspective, it doesn't feel like it's
> much
> > >>>> >>>>>> different
> > >>>> >>>>>>>> from JOINs
> > >>>> >>>>>>>> But I don't understand relational theory or DB
> functionality
> > >>>> in the
> > >>>> >>>>>> least,
> > >>>> >>>>>>>> so I'm not fit to judge.
> > >>>> >>>>>>>>
> > >>>> >>>>>>>> Just curious why Calcite doesn't allow this
> > >>>> >>>>>>>>
> > >>>> >>>>>>
> > >>>> >>>>>>
> > >>>> >>
> > >>>> >>
> > >>>>
> > >>>>
> >
>