Hi All,
Currently IMPALA don't support complex data type as input/output parameters for Bulltin Functions / User Defined Functions, If IMPALA can support this, the Unnest() or other operation on complex data type may easy to achieve. -- Best Regards Dou At 2021-09-13 15:08:52, "Quanlong Huang" <huangquanl...@gmail.com> wrote: >This seems like the explode() UDTF in Hive: >https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-explode(array) > >Hive doesn't support multiple UDTF in the SelectList. I think we can make >the same choice that only supports one UNNEST() in the SelectList. If >multiple columns need to be unfolded, they should be wrapped in the same >UNNEST() function with a keyword/func specifying how to join the rows. E.g. > >-- cross join by default >select UNNEST(array_a, array_b) from tbl; >-- specifying the join type >select UNNEST(zip array_a, array_b) from tbl; >select UNNEST(arrays_zip(array_a, array_b)) from tbl; > >explode() in Hive only supports one column. The above syntax is inspired by >SparkSQL: >https://stackoverflow.com/questions/33220916/explode-transpose-multiple-columns-in-spark-sql-table >https://spark.apache.org/docs/latest/api/sql/index.html#arrays_zip > >On a separate note, what about using the same name(explode) as Hive? I >think it'd be helpful for migrations into Impala, especially if we also >support the LATERAL VIEW syntax. > >Thanks, >Quanlong > >On Sun, Sep 12, 2021 at 8:56 AM Tim Armstrong <tim.g.armstr...@gmail.com> >wrote: > >> The syntax/behaviour seems kinda unusual to me - do any other systems aside >> from postgres implement this? UNNEST() in the from clause seems more >> common, and it is generally not consistent with the rest of SQL for SELECT >> clause entries to affect the set of rows returned. >> >> I'm not sure but it seems like it might be tricky to make the select clause >> in the Impala frontend do this. >> >> Some way of doing this in FROM clause seems more intuitive and consistent >> with the rest of sql, e.g. >> >> select id, e1, e2 >> from tbl t, UNNEST(t.arr1, t.arr2) (e1, e2) >> >> I'm tempted to say that it should be called ZIP() since that's the name of >> this operation in functional programming, but I don't think inventing new >> SQL keywords is really ideal. >> >> I'm also a little curious about the use case for this. I think ideally you >> would represent two arrays with correlated indices as an array of structs >> instead of two arrays, but I can appreciate that schema designs may not >> always be fully normalised. >> >> On Fri, 10 Sept 2021 at 18:38, Aman Sinha <amansi...@gmail.com> wrote: >> >> > Hi Gabor, >> > An implicit assumption in the proposed syntax is that there is some >> > association between an item at a particular position in the first array >> and >> > the same position in the second array. >> > However, in general the arrays are not ordered in a semantic sense. e.g >> > car_make: {"Honda", "Toyota"}, car_model: {"Prius", "Accord"}. The >> > proposed UNNEST behavior will not produce any rows where {Honda, Accord} >> > appear together which would be unexpected. Although one could make the >> > argument that in order to get this behavior you could fall back to the >> > current syntax of putting the arrays in the FROM clause, it could be >> > confusing to explain when one should be used vs other. It would be >> useful >> > to see what other systems (apart from Postgres) which have more native >> > complex types support do in this case (e.g Presto, Couchbase, Bigquery). >> > >> > One alternative I can think of is to make the association between the >> items >> > in multiple arrays be more explicit e.g by using an outer join on the >> > ordinal position. That way, items at the same position appear together >> in a >> > row and for non-matches we produce Null for the smaller array. I haven't >> > thought through the exact syntax for this yet. >> > >> > -Aman >> > >> > On Fri, Sep 10, 2021 at 11:04 AM Gabor Kaszab <gaborkas...@cloudera.com> >> > wrote: >> > >> > > Hey, >> > > >> > > I'm working on implementing a new UNNEST() operation that does what it >> > > says, unnests arrays. But here the trick would come when there are >> > multiple >> > > UNNESTS() in the select list. Let me explain the desired behaviour with >> > an >> > > example: >> > > >> > > Let's assume the following table: >> > > [image: Screenshot 2021-09-10 at 19.58.53.png] >> > > An the following query: >> > > SELECT id, UNNEST(arr1), UNNEST(arr2) FROM tbl; >> > > >> > > Here the expected result is to not produce a join on the two arrays as >> we >> > > would with the original way of giving them in the select lit, but to >> put >> > > the values of the arrays "next to each other" like this: >> > > [image: Screenshot 2021-09-10 at 19.59.04.png] >> > > I have investigated some options how this could be implemented and >> > created >> > > a doc for my notes. Let me share this with the community so that people >> > can >> > > share their preference. Any feedback is welcome! >> > > >> > > >> > > >> > >> https://docs.google.com/document/d/184EKJwMME4SNzyfOTueI-nz-IL-WUiBeaS8Zhi2XzMo/edit?usp=sharing >> > > >> > > Cheers, >> > > Gabor >> > > >> > > >> > > >> > >>