Thanks for sharing your view, Tim! And sorry for the late answer. *Feature flag:* The more thought I give this the more I feel that using a feature flag indeed would be confusing. Additionally, I like the idea of having a special syntax for the zipping unnest.
*Syntax:* I checked the SQL standard and it says something really similar to your suggestion: Example from the standard: SELECT M.E FROM UNNEST (MVE) AS M(E) <collection derived table> ::= UNNEST <left paren> <collection value expression> [ { <comma> <collection value expression> }... ] <right paren> [ WITH ORDINALITY ] And you can specify aliases for the <collection derived table>s with the AS keyword and even the standard mentions multiple collection parameters for the unnest(). So let's have the query rewrite this way: SELECT UNNEST(t.a1), UNNEST(t.a2) FROM t; ==> SELECT _tmp1, _tmp2 FROM t, UNNEST(t.a1, t.a2) AS (_tmp1, tmp2); *Arrays from multiple tables:* I checked in Postgres and apparently you can unnest arrays from multiple tables and they still get zipped. I think for the first iteration this would be overkill to implement as Impala puts the arrays from different tables into a different subplan making it a bit more complex to implement, so I'd just restrict unnest() to a single table at this point. *UnnestTableRef:* I also thought that these unnests in the FROM clause should go into a kind of TableRef (e.g. UnnestTableRef?, ZippingUnnestTableRef) but there I found that each TableRef has a single rawPath_ and this might not be the best for a single TableRef descendant holding multiple arrays with multiple rawPath_s. But this is not a big issue, I'm sure there is another way to get around this. Cheers, Gabor On Tue, Sep 14, 2021 at 7:36 AM Tim Armstrong <tim.g.armstr...@gmail.com> wrote: > I think compatibility with Postgres derivatives generally makes sense as a > goal, even if the behaviour isn't ideal - definitely much easier than > rewriting everything to a different syntax. > > With your rewrite proposal, would you also be able to express these queries > by unnesting in the FROM clause? > > SELECT unnest(t.a) c1, unnest(t.b) c1 > FROM tbl t > => > SELECT _tmp1 c1, _tmp2 c2 > FROM tbl t, UNNEST(t.a, t.b) (_tmp1, _tmp2) > > I'm not sure a feature flag would work well for this, since you > might potentially have a more complex FROM clause with other unnesting and > I'm not sure how you would model that. I suspect it works out cleaner to > model the zipped arrays as a TableRef - for most purposes in analysis and > planning it's going to be treated the same as a TableRef that is just a > plain array of structs - the analysis/planner code for TableRefs, joins, > etc isn't the easiest to extend. > > I guess you could also rewrite it to something like this too, which would > really let you isolate the special handling to a simple SELECT block: > SELECT v._tmp1 c1, v._tmp2 c2 > FROM tbl t, (SELECT t.a __tmp1, t.b __tmp2 FROM t.a ZIP JOIN t.b) v > > Another question: what happens if the two arrays are from different table > references? Is that allowed? Do they still get zipped together? > > On Mon, 13 Sept 2021 at 04:43, Gabor Kaszab <gaborkas...@apache.org> > wrote: > > > Hey All, > > Thanks for taking a look and sharing your view. The whole idea of having > > multiple UNNEST() functions in the select list (and zipping the results) > > came from a user as a direct feature request as they try to move to > Impala > > from Greenplum and as they say they can't rewrite their existing queries. > > > > The most straightforward way would be to implement UNNEST() as a UDTF but > > Impala doesn't support them, and anyway, as mentioned above Impala > doesn't > > support even UDFs with nested type parameters. > > > > My idea was something like introducing an unnest SlotRef or such, change > > the SQL parser to translate each UNNEST(array) into this SlotRef and then > > do a query rewrite based on that so that we can process the unnest. My > > preference would be to rewrite the query to the syntax we have now with > > joining the arrays in the FROM clause as collection table refs and then > > have a feature flag (e.g. unnest_join_type) to indicate whether we want > the > > joined or the zipped behaviour. For the zipped one we can modify the > > existing UNNEST node to be able to receive more arrays and then do the > zip > > itself. > > > > What do you think? > > > > Gabor > > > > On Mon, Sep 13, 2021 at 9:35 AM 彭斗 <pengdou1...@126.com> wrote: > > > > > Hi All, > > > > > > > > > Currently IMPALA don't support complex data type as input/output > > > parameters for Bulltin Functions / User Defined Functions, > > > If IMPALA can support this, the Unnest() or other operation on > complex > > > data type may easy to achieve. > > > > > > > > > > > > > > > -- > > > > > > Best Regards > > > Dou > > > > > > > > > > > > > > > > > > At 2021-09-13 15:08:52, "Quanlong Huang" <huangquanl...@gmail.com> > > wrote: > > > >This seems like the explode() UDTF in Hive: > > > > > > > > > > https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-explode(array) > > > > > > > >Hive doesn't support multiple UDTF in the SelectList. I think we can > > make > > > >the same choice that only supports one UNNEST() in the SelectList. If > > > >multiple columns need to be unfolded, they should be wrapped in the > same > > > >UNNEST() function with a keyword/func specifying how to join the rows. > > > E.g. > > > > > > > >-- cross join by default > > > >select UNNEST(array_a, array_b) from tbl; > > > >-- specifying the join type > > > >select UNNEST(zip array_a, array_b) from tbl; > > > >select UNNEST(arrays_zip(array_a, array_b)) from tbl; > > > > > > > >explode() in Hive only supports one column. The above syntax is > inspired > > > by > > > >SparkSQL: > > > > > > > > > > https://stackoverflow.com/questions/33220916/explode-transpose-multiple-columns-in-spark-sql-table > > > >https://spark.apache.org/docs/latest/api/sql/index.html#arrays_zip > > > > > > > >On a separate note, what about using the same name(explode) as Hive? I > > > >think it'd be helpful for migrations into Impala, especially if we > also > > > >support the LATERAL VIEW syntax. > > > > > > > >Thanks, > > > >Quanlong > > > > > > > >On Sun, Sep 12, 2021 at 8:56 AM Tim Armstrong < > > tim.g.armstr...@gmail.com> > > > >wrote: > > > > > > > >> The syntax/behaviour seems kinda unusual to me - do any other > systems > > > aside > > > >> from postgres implement this? UNNEST() in the from clause seems more > > > >> common, and it is generally not consistent with the rest of SQL for > > > SELECT > > > >> clause entries to affect the set of rows returned. > > > >> > > > >> I'm not sure but it seems like it might be tricky to make the select > > > clause > > > >> in the Impala frontend do this. > > > >> > > > >> Some way of doing this in FROM clause seems more intuitive and > > > consistent > > > >> with the rest of sql, e.g. > > > >> > > > >> select id, e1, e2 > > > >> from tbl t, UNNEST(t.arr1, t.arr2) (e1, e2) > > > >> > > > >> I'm tempted to say that it should be called ZIP() since that's the > > name > > > of > > > >> this operation in functional programming, but I don't think > inventing > > > new > > > >> SQL keywords is really ideal. > > > >> > > > >> I'm also a little curious about the use case for this. I think > ideally > > > you > > > >> would represent two arrays with correlated indices as an array of > > > structs > > > >> instead of two arrays, but I can appreciate that schema designs may > > not > > > >> always be fully normalised. > > > >> > > > >> On Fri, 10 Sept 2021 at 18:38, Aman Sinha <amansi...@gmail.com> > > wrote: > > > >> > > > >> > Hi Gabor, > > > >> > An implicit assumption in the proposed syntax is that there is > some > > > >> > association between an item at a particular position in the first > > > array > > > >> and > > > >> > the same position in the second array. > > > >> > However, in general the arrays are not ordered in a semantic > sense. > > > e.g > > > >> > car_make: {"Honda", "Toyota"}, car_model: {"Prius", "Accord"}. > > The > > > >> > proposed UNNEST behavior will not produce any rows where {Honda, > > > Accord} > > > >> > appear together which would be unexpected. Although one could > make > > > the > > > >> > argument that in order to get this behavior you could fall back to > > the > > > >> > current syntax of putting the arrays in the FROM clause, it could > be > > > >> > confusing to explain when one should be used vs other. It would > be > > > >> useful > > > >> > to see what other systems (apart from Postgres) which have more > > native > > > >> > complex types support do in this case (e.g Presto, Couchbase, > > > Bigquery). > > > >> > > > > >> > One alternative I can think of is to make the association between > > the > > > >> items > > > >> > in multiple arrays be more explicit e.g by using an outer join on > > the > > > >> > ordinal position. That way, items at the same position appear > > together > > > >> in a > > > >> > row and for non-matches we produce Null for the smaller array. I > > > haven't > > > >> > thought through the exact syntax for this yet. > > > >> > > > > >> > -Aman > > > >> > > > > >> > On Fri, Sep 10, 2021 at 11:04 AM Gabor Kaszab < > > > gaborkas...@cloudera.com> > > > >> > wrote: > > > >> > > > > >> > > Hey, > > > >> > > > > > >> > > I'm working on implementing a new UNNEST() operation that does > > what > > > it > > > >> > > says, unnests arrays. But here the trick would come when there > are > > > >> > multiple > > > >> > > UNNESTS() in the select list. Let me explain the desired > behaviour > > > with > > > >> > an > > > >> > > example: > > > >> > > > > > >> > > Let's assume the following table: > > > >> > > [image: Screenshot 2021-09-10 at 19.58.53.png] > > > >> > > An the following query: > > > >> > > SELECT id, UNNEST(arr1), UNNEST(arr2) FROM tbl; > > > >> > > > > > >> > > Here the expected result is to not produce a join on the two > > arrays > > > as > > > >> we > > > >> > > would with the original way of giving them in the select lit, > but > > to > > > >> put > > > >> > > the values of the arrays "next to each other" like this: > > > >> > > [image: Screenshot 2021-09-10 at 19.59.04.png] > > > >> > > I have investigated some options how this could be implemented > and > > > >> > created > > > >> > > a doc for my notes. Let me share this with the community so that > > > people > > > >> > can > > > >> > > share their preference. Any feedback is welcome! > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > >> > > > > > > https://docs.google.com/document/d/184EKJwMME4SNzyfOTueI-nz-IL-WUiBeaS8Zhi2XzMo/edit?usp=sharing > > > >> > > > > > >> > > Cheers, > > > >> > > Gabor > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > >> > > > > > >