My suggestion for item 4 to the select clause and come up with a smart way to extend Calcite to support this type of extension. Validator and Sql2Rel aren't currently pluggable in any way (or maybe I'm mistaken) and I think that is a problem. Definitely agree that this should happen in validator and sql2rel rather than not until rel-rel transformations.
-- Jacques Nadeau CTO and Co-Founder, Dremio On Mon, Oct 12, 2015 at 4:55 PM, Jinfeng Ni <[email protected]> wrote: > The first 3 items is same as what I would think of flatten, if we want > to Drill to perform in a SQL way. > > The last item is something that drives me to send the first email. > That is, whether Drill should allow the flexibility of such shortcut, > not only in SELECT clause, but also in other places including > OB/GB/WHERE etc. > > Adding one layer of sql rewrite probably could get the last item go > through Calcite, although I'm not clear how to achieve that in Calcite > as an extension. For instance, where should such rewrite happen, > right after sql validation, but before sql2rel converter? > > The thing is, for now such rewrite happens after Calcite, which seems > to cause various of problem. That's, in some case, such shortcut works > perfectly, while in other cases, such shortcut does not work. My > original thinking is whether it makes sense to block such shortcut > until we could implement the items you suggest. > > > > On Mon, Oct 12, 2015 at 3:42 PM, Jacques Nadeau <[email protected]> > wrote: > > I was recommending the following: > > > > - flatten is table function > > - we should support it in traditional table function syntax > > - we should support lateral join/lateral subqueries > > - we have shortcut for a lateral join combined with a table function used > > in the select clause > > > > The last item could be done as some kind of sql rewrite. Either add > support > > directly to Calcite or as an extension. We should have never forked > Calcite > > (my mistake) and need to get back onto master. I don't see why this > > suggests otherwise. > > > > > > > > > > -- > > Jacques Nadeau > > CTO and Co-Founder, Dremio > > > > On Mon, Oct 12, 2015 at 3:27 PM, Jinfeng Ni <[email protected]> > wrote: > > > >> The first question I have is what is flatten() function. Is it table > >> functions or regular function? > >> > >> If Drill wants to model flatten() as table function, and want to fit > >> flatten() into Calcite's table function structure, then I guess we had > >> better add such restriction. > >> > >> Seems to me Julian's on-going work for table function is similar to > >> what I understand for table function. 1) it appears in FROM clause, 2) > >> As such, it would not nest a table function with a regular function. > >> > >> By "lateral join", are you referring to lateral subqueries [1]? > >> Lateral subqueries seems to also appear in FROM clause. > >> > >> So, if we want Drill to allow the sugared syntax for flatten, but also > >> want to use Calcite's UDTF, my sense is that it is yet another reason > >> for us to use forked Calcite, simply because the way Drill wants is > >> different from what normal SQL planner would do. > >> > >> > >> > >> [1]. > >> > http://www.postgresql.org/docs/9.4/static/queries-table-expressions.html > >> > >> On Mon, Oct 12, 2015 at 2:57 PM, Jacques Nadeau <[email protected]> > >> wrote: > >> > It's funny, I've been looking at this stuff recently in the thinking > >> about > >> > trying to fit Flatten into Calcite's UDTF construct. > >> > > >> > My inclination is that we basically have sugared the syntax for a > lateral > >> > join with a udtf. I think this is a useful syntax when restricted to > the > >> > select clause and think we should make it more general to all udtf's. > >> What > >> > do others think? I think we should also add support for the more > >> > traditional syntax structure. > >> > > >> > Have you looked at Calcite's planning of lateral join and UDTFs as we > may > >> > be able to push this sugared syntax into a clearer pattern. > >> > > >> > With regards to not allowing expression trees with flatten, I'm not > sure > >> > that is a required constraint. > >> > > >> > -- > >> > Jacques Nadeau > >> > CTO and Co-Founder, Dremio > >> > > >> > On Mon, Oct 12, 2015 at 2:44 PM, Jinfeng Ni <[email protected]> > >> wrote: > >> > > >> >> Drill currently supports flatten(), which flattens elements in a > >> >> repeated field into a set of individual records [1]. > >> >> > >> >> My understanding is that flatten is not part of SQL standard (SQL > >> >> standard uses UNNEST). It's a concept borrowed from Google's > >> >> BigQuery[2]. However, the way Drill uses flatten seems to be > >> >> "different" from the way flatten is used in BiqQuery, or UNNEST in > >> >> SQL. > >> >> > >> >> 1) In Drill, flatten is more like a regular function; it could appear > >> >> in SELECT clause, ORDER BY, GROUP BY, but NOT in FROM clause. > >> >> In contrast, in BigQuery, flatten seems to only appear in the > FROM > >> >> clause, in a way more like table function. > >> >> > >> >> 2) Because Drill treats flatten like a regular function, it could be > >> >> nested with a regular scalar function. I could not find such usage in > >> >> BigQuery (Please point me an example if someone finds an example). > >> >> > >> >> As such, Drill currently allows the following: > >> >> > >> >> Assuming (T1 has one row only and T1.intList : [ 1, 2, 3] ) > >> >> > >> >> Q1: > >> >> select flatten(T1.intList) + 1 > >> >> from > >> >> T1; > >> >> > >> >> Would return 3 rows: > >> >> 2, > >> >> 3, > >> >> 4 > >> >> > >> >> To me, it is kind of unusual usage to 1) allow flatten to appear > >> >> anywhere where a regular scalar function would appear, 2) allow > >> >> flatten to be nested with regular scalar function. > >> >> > >> >> Such "flexibility" probably has not been well tested. For instance, > >> >> if I put flatten() in WHERE clause, Drill would throw function > >> >> resolution error. > >> >> > >> >> Also, I'm not sure how Drill would do if we put flatten() nested with > >> >> regular scalar function inside window aggregate function, with clause > >> >> (CTE), etc? > >> >> > >> >> My question is : should Drill restrict flatten such that > >> >> > >> >> 1) it will only appear in SELECT clause for now (I understand it > would > >> >> require significant re-work if we want to use BiqQuery way and use > >> >> flatten in FROM clause). > >> >> 2) it would not allow flatten to be nested with regular scalar > function. > >> >> > >> >> With such restriction, the above Q1 could be rewritten into: > >> >> > >> >> select f + 1 > >> >> from ( select flatten(T1.intList) as f > >> >> from T1) > >> >> > >> >> I understand there is some discussion to improve flatten() in some > >> >> JIRAs. What does other think about adding this restriction to > >> >> flatten? > >> >> > >> >> > >> >> [1]. https://drill.apache.org/docs/flatten/ > >> >> [2]. https://cloud.google.com/bigquery/docs/data#flatten > >> >> > >> >
