Re: flatten() function, scalar functions, nested ?

Jacques Nadeau Mon, 12 Oct 2015 15:43:49 -0700

I was recommending the following:

- flatten is table function
- we should support it in traditional table function syntax
- we should support lateral join/lateral subqueries
- we have shortcut for a lateral join combined with a table function used
in the select clause


The last item could be done as some kind of sql rewrite. Either add support
directly to Calcite or as an extension. We should have never forked Calcite
(my mistake) and need to get back onto master. I don't see why this
suggests otherwise.




--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Mon, Oct 12, 2015 at 3:27 PM, Jinfeng Ni <[email protected]> wrote:

> The first question I have is what is flatten() function. Is it table
> functions or regular function?
>
> If Drill wants to model flatten() as table function, and want to fit
> flatten() into Calcite's table function structure, then I guess we had
> better add such restriction.
>
> Seems to me Julian's on-going work for table function is similar to
> what I understand for table function. 1) it appears in FROM clause, 2)
> As such, it would not nest a table function with a regular function.
>
> By "lateral join", are you referring to lateral subqueries [1]?
> Lateral subqueries seems to also appear in FROM clause.
>
> So, if we want Drill to allow the sugared syntax for flatten, but also
> want to use Calcite's UDTF, my sense is that it is yet another reason
> for us to use forked Calcite, simply because the way Drill wants is
> different from what normal SQL planner would do.
>
>
>
> [1].
> http://www.postgresql.org/docs/9.4/static/queries-table-expressions.html
>
> On Mon, Oct 12, 2015 at 2:57 PM, Jacques Nadeau <[email protected]>
> wrote:
> > It's funny, I've been looking at this stuff recently in the thinking
> about
> > trying to fit Flatten into Calcite's UDTF construct.
> >
> > My inclination is that we basically have sugared the syntax for a lateral
> > join with a udtf. I think this is a useful syntax when restricted to the
> > select clause and think we should make it more general to all udtf's.
> What
> > do others think? I think we should also add support for the more
> > traditional syntax structure.
> >
> > Have you looked at Calcite's planning of lateral join and UDTFs as we may
> > be able to push this sugared syntax into a clearer pattern.
> >
> > With regards to not allowing expression trees with flatten, I'm not sure
> > that is a required constraint.
> >
> > --
> > Jacques Nadeau
> > CTO and Co-Founder, Dremio
> >
> > On Mon, Oct 12, 2015 at 2:44 PM, Jinfeng Ni <[email protected]>
> wrote:
> >
> >> Drill currently supports flatten(), which flattens elements in a
> >> repeated field into a set of individual records [1].
> >>
> >> My understanding is that flatten is not part of SQL standard (SQL
> >> standard uses UNNEST). It's a concept borrowed from Google's
> >> BigQuery[2]. However, the way Drill uses flatten seems to be
> >> "different" from the way flatten is used in BiqQuery, or UNNEST in
> >> SQL.
> >>
> >> 1) In Drill, flatten is more like a regular function; it could appear
> >> in SELECT clause, ORDER BY, GROUP BY, but NOT in FROM clause.
> >>     In contrast, in BigQuery, flatten seems to only appear in the FROM
> >> clause, in a way more like table function.
> >>
> >> 2) Because Drill treats flatten like a regular function, it could be
> >> nested with a regular scalar function. I could not find such usage in
> >> BigQuery (Please point me an example if someone finds an example).
> >>
> >> As such, Drill currently allows the following:
> >>
> >> Assuming (T1 has one row only and T1.intList  : [ 1, 2, 3] )
> >>
> >> Q1:
> >> select flatten(T1.intList) + 1
> >> from
> >> T1;
> >>
> >> Would return 3 rows:
> >> 2,
> >> 3,
> >> 4
> >>
> >> To me, it is kind of unusual usage to 1) allow flatten to appear
> >> anywhere where a regular scalar function would appear, 2) allow
> >> flatten to be nested with regular scalar function.
> >>
> >> Such "flexibility"  probably has not been well tested. For instance,
> >> if I put flatten() in WHERE clause, Drill would throw function
> >> resolution error.
> >>
> >> Also, I'm not sure how Drill would do if we put flatten() nested with
> >> regular scalar function inside window aggregate function, with clause
> >> (CTE), etc?
> >>
> >> My question is : should Drill restrict flatten such that
> >>
> >> 1) it will only appear in SELECT clause for now (I understand it would
> >> require significant re-work if we want to use BiqQuery way and use
> >> flatten in FROM clause).
> >> 2) it would not allow flatten to be nested with regular scalar function.
> >>
> >> With such restriction, the above Q1 could be rewritten into:
> >>
> >> select  f + 1
> >> from ( select flatten(T1.intList) as f
> >>            from T1)
> >>
> >> I understand there is some discussion to improve flatten() in some
> >> JIRAs.  What does other think about adding this restriction to
> >> flatten?
> >>
> >>
> >> [1]. https://drill.apache.org/docs/flatten/
> >> [2]. https://cloud.google.com/bigquery/docs/data#flatten
> >>
>

Re: flatten() function, scalar functions, nested ?

Reply via email to