Drill currently supports flatten(), which flattens elements in a
repeated field into a set of individual records [1].

My understanding is that flatten is not part of SQL standard (SQL
standard uses UNNEST). It's a concept borrowed from Google's
BigQuery[2]. However, the way Drill uses flatten seems to be
"different" from the way flatten is used in BiqQuery, or UNNEST in
SQL.

1) In Drill, flatten is more like a regular function; it could appear
in SELECT clause, ORDER BY, GROUP BY, but NOT in FROM clause.
    In contrast, in BigQuery, flatten seems to only appear in the FROM
clause, in a way more like table function.

2) Because Drill treats flatten like a regular function, it could be
nested with a regular scalar function. I could not find such usage in
BigQuery (Please point me an example if someone finds an example).

As such, Drill currently allows the following:

Assuming (T1 has one row only and T1.intList  : [ 1, 2, 3] )

Q1:
select flatten(T1.intList) + 1
from
T1;

Would return 3 rows:
2,
3,
4

To me, it is kind of unusual usage to 1) allow flatten to appear
anywhere where a regular scalar function would appear, 2) allow
flatten to be nested with regular scalar function.

Such "flexibility"  probably has not been well tested. For instance,
if I put flatten() in WHERE clause, Drill would throw function
resolution error.

Also, I'm not sure how Drill would do if we put flatten() nested with
regular scalar function inside window aggregate function, with clause
(CTE), etc?

My question is : should Drill restrict flatten such that

1) it will only appear in SELECT clause for now (I understand it would
require significant re-work if we want to use BiqQuery way and use
flatten in FROM clause).
2) it would not allow flatten to be nested with regular scalar function.

With such restriction, the above Q1 could be rewritten into:

select  f + 1
from ( select flatten(T1.intList) as f
           from T1)

I understand there is some discussion to improve flatten() in some
JIRAs.  What does other think about adding this restriction to
flatten?


[1]. https://drill.apache.org/docs/flatten/
[2]. https://cloud.google.com/bigquery/docs/data#flatten

Reply via email to