Re: Support unnest multiple array columns

wmy7ymw Wed, 12 Feb 2020 12:20:14 -0800

On Wed, Feb 12, 2020 at 11:30 AM Julian Hyde <[email protected]> wrote:


> It’s very possible that there is a bug. This functionality is not well
> tested. Can you please log it.
>
Will do. I also test the same thing in Presto: (CROSS JOIN UNNEST(array) as
a CROSS JOIN UNNEST(another_array) as b) and it also yields a cartesian
product. But anyway I logged here
<https://issues.apache.org/jira/browse/CALCITE-3787> as a ref.

>
> I don’t particularly see the need for Presto’s SQL extension but I have no
> objection to adding it under a conformance flag. I believe we can implement
> it as a rewrite (i.e. syntactic sugar). Let’s do that as a separate change.
>
Make sense. Parsing is not an issue for us right now. Better to keep it
simple right now.

>
> For what it’s worth, I find this UNNEST a lot less odious than Hive’s
> UNNEST in the SELECT clause. Other than DISTINCT and aggregate functions,
> nothing you put in the SELECT clause should affect the number of rows
> coming out of the query.
>
Hmm, man, there seems no UNNEST in Hive. But maybe it's introduced in a new
version.

>
> Julian
>
>
> > On Feb 12, 2020, at 10:56 AM, wmy7ymw <[email protected]> wrote:
> >
> > Hi Julian,
> >
> > Thanks for the reply. Tried it with SqlToRelConverter and it seems to be
> > Cartesian product but not the desired behavior:
> >
> >> @Test public void testUnnestArrayPlan() {
> >>    final String sql = "select d.deptno, e2.empno, e3.detail\n"
> >>        + "from dept_nested as d,\n"
> >>        + " UNNEST(d.employees) e2, UNNEST(d.employees) e3";
> >>    sql(sql).with(getExtendedTester()).ok();
> >>  }
> >>
> >
> >> @Test public void testUnnestArrayPlan() {
> >>    final String sql = "select d.deptno, e2.empno, e3.detail\n"
> >>        + "from dept_nested as d CROSS JOIN\n"
> >>        + " UNNEST(d.employees) e2 CROSS JOIN UNNEST(d.employees) e3";
> >>    sql(sql).with(getExtendedTester()).ok();
> >>  }
> >
> >
> > and both of them yield a plan like
> >
> > LogicalProject(DEPTNO=[$0], EMPNO=[$7], DETAIL=[ROW($12)])
> >>  LogicalCorrelate(correlation=[$cor1], joinType=[inner],
> >> requiredColumns=[{6}])
> >>    LogicalCorrelate(correlation=[$cor0], joinType=[inner],
> >> requiredColumns=[{6}])
> >>      LogicalProject(DEPTNO=[$0], NAME=[$1], TYPE=[$2.TYPE],
> >> DESC=[$2.DESC], A=[$2.OTHERS.A], B=[$2.OTHERS.B], EMPLOYEES=[$3])
> >>        LogicalTableScan(table=[[CATALOG, SALES, DEPT_NESTED]])
> >>      Uncollect
> >>        LogicalProject(EMPLOYEES=[$cor0.EMPLOYEES_6])
> >>          LogicalValues(tuples=[[{ 0 }]])
> >>    Uncollect
> >>      LogicalProject(EMPLOYEES=[$cor1.EMPLOYEES_6])
> >>        LogicalValues(tuples=[[{ 0 }]])
> >
> >
> > I only see the desired behavior in Presto, and we are currently using
> > Calcite to validate (some of) Presto queries. Except the behavior above,
> > Calcite and Presto treat the following things for structured types
> > differently:
> >
> >> CROSS JOIN UNNEST(array_of_structures_with_properties_a_b_and_c) AS t
> >
> > in Calcite is equivalent to
> >
> >> CROSS JOIN UNNEST(array_of_structures_with_properties_a_b_and_c) AS
> >> some_nonused_identifier(t)
> >>
> >
> > To resolve such differences, we would like to implement a new conformance
> > and validation logics to reflect differences above.
> >
> > Regarding Sql To Rel, I am still not sure what SHOULD BE the right
> logical
> > plan for the desired behavior given the above example. Let me know.
> >
> > Thanks,
> > Will
> >
> > On Tue, Feb 11, 2020 at 11:11 PM Julian Hyde <[email protected]> wrote:
> >
> >> I haven't tried it, but I'd guess that you join to each separately,
> >> i.e. take the cartesian product:
> >>
> >> SELECT numbers, animals, n, a
> >> FROM (
> >>  VALUES
> >>    (ARRAY[2, 5], ARRAY['dog', 'cat', 'bird']),
> >>    (ARRAY[7, 8, 9], ARRAY['cow', 'pig'])
> >> ) AS x (numbers, animals)
> >> CROSS JOIN UNNEST (numbers) AS t1 (n)
> >> CROSS JOIN UNNEST (animals) AS t2 (a)
> >>
> >> This doesn't require an extension to standard SQL.
> >>
> >> On Tue, Feb 11, 2020 at 11:07 PM Miao Yu <[email protected]> wrote:
> >>>
> >>> Hi Calcite experts:
> >>>
> >>> In Presto, multiple array columns could be flattened "together", as
> >> Presto
> >>> docs <https://prestodb.io/docs/current/sql/select.html> shows:
> >>>
> >>> SELECT numbers, animals, n, aFROM (
> >>>>  VALUES
> >>>>    (ARRAY[2, 5], ARRAY['dog', 'cat', 'bird']),
> >>>>    (ARRAY[7, 8, 9], ARRAY['cow', 'pig'])) AS x (numbers,
> >> animals)CROSS JOIN UNNEST(numbers, animals) AS t (n, a)
> >>>>
> >>>> yields:
> >>>
> >>>  numbers  |     animals      |  n   |  a
> >>> -----------+------------------+------+------
> >>> [2, 5]    | [dog, cat, bird] |    2 | dog
> >>> [2, 5]    | [dog, cat, bird] |    5 | cat
> >>> [2, 5]    | [dog, cat, bird] | NULL | bird
> >>> [7, 8, 9] | [cow, pig]       |    7 | cow
> >>> [7, 8, 9] | [cow, pig]       |    8 | pig
> >>> [7, 8, 9] | [cow, pig]       |    9 | NULL
> >>>
> >>>
> >>> May I know what is the equivalent in Calcite SQL?
> >>>
> >>> Thanks!
> >>> Will
> >>
>
>

Re: Support unnest multiple array columns

Reply via email to