The syntax/behaviour seems kinda unusual to me - do any other systems aside
from postgres implement this? UNNEST() in the from clause seems more
common, and it is generally not consistent with the rest of SQL for SELECT
clause entries to affect the set of rows returned.

I'm not sure but it seems like it might be tricky to make the select clause
in the Impala frontend do this.

Some way of doing this in FROM clause seems more intuitive and consistent
with the rest of sql, e.g.

select id, e1, e2
from tbl t, UNNEST(t.arr1, t.arr2) (e1, e2)

I'm tempted to say that it should be called ZIP() since that's the name of
this operation in functional programming, but I don't think inventing new
SQL keywords is really ideal.

I'm also a little curious about the use case for this. I think ideally you
would represent two arrays with correlated indices as an array of structs
instead of two arrays, but I can appreciate that schema designs may not
always be fully normalised.

On Fri, 10 Sept 2021 at 18:38, Aman Sinha <amansi...@gmail.com> wrote:

> Hi Gabor,
> An implicit assumption in the proposed syntax is that there is some
> association between an item at a particular position in the first array and
> the same position in the second array.
> However,  in general the arrays are not ordered in a semantic sense.  e.g
> car_make: {"Honda", "Toyota"},   car_model: {"Prius", "Accord"}.   The
> proposed UNNEST behavior will not produce any rows where {Honda, Accord}
> appear together which would be unexpected.  Although one could make the
> argument that in order to get this behavior you could fall back to the
> current syntax of putting the arrays in the FROM clause, it could be
> confusing to explain when one should be used vs other.  It would be useful
> to see what other systems (apart from Postgres) which have more native
> complex types support do in this case (e.g Presto, Couchbase, Bigquery).
>
> One alternative I can think of is to make the association between the items
> in multiple arrays be more explicit e.g by using an outer join on the
> ordinal position. That way, items at the same position appear together in a
> row and for non-matches we produce Null for the smaller array. I haven't
> thought through the exact syntax for this yet.
>
> -Aman
>
> On Fri, Sep 10, 2021 at 11:04 AM Gabor Kaszab <gaborkas...@cloudera.com>
> wrote:
>
> > Hey,
> >
> > I'm working on implementing a new UNNEST() operation that does what it
> > says, unnests arrays. But here the trick would come when there are
> multiple
> > UNNESTS() in the select list. Let me explain the desired behaviour with
> an
> > example:
> >
> > Let's assume the following table:
> > [image: Screenshot 2021-09-10 at 19.58.53.png]
> > An the following query:
> > SELECT id, UNNEST(arr1), UNNEST(arr2) FROM tbl;
> >
> > Here the expected result is to not produce a join on the two arrays as we
> > would with the original way of giving them in the select lit, but to put
> > the values of the arrays "next to each other" like this:
> > [image: Screenshot 2021-09-10 at 19.59.04.png]
> > I have investigated some options how this could be implemented and
> created
> > a doc for my notes. Let me share this with the community so that people
> can
> > share their preference. Any feedback is welcome!
> >
> >
> >
> https://docs.google.com/document/d/184EKJwMME4SNzyfOTueI-nz-IL-WUiBeaS8Zhi2XzMo/edit?usp=sharing
> >
> > Cheers,
> > Gabor
> >
> >
> >
>

Reply via email to