My suggestion for item 4 to the select clause and come up with a smart way
to extend Calcite to support this type of extension. Validator and Sql2Rel
aren't currently pluggable in any way (or maybe I'm mistaken) and I think
that is a problem. Definitely agree that this should happen in validator
and sql2rel rather than not until rel-rel transformations.

--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Mon, Oct 12, 2015 at 4:55 PM, Jinfeng Ni <[email protected]> wrote:

> The first 3 items is same as what I would think of flatten, if we want
> to Drill to perform in a SQL way.
>
> The last item is something that drives me to send the first email.
> That is, whether Drill should allow the flexibility of such shortcut,
> not only in SELECT clause, but also in other places including
> OB/GB/WHERE etc.
>
> Adding one layer of sql rewrite probably could get the last item go
> through Calcite, although I'm not clear how to achieve that in Calcite
> as an extension.  For instance, where should such rewrite happen,
> right after sql validation, but before sql2rel converter?
>
> The thing is, for now such rewrite happens after Calcite, which seems
> to cause various of problem. That's, in some case, such shortcut works
> perfectly, while in other cases, such shortcut does not work. My
> original thinking is whether it makes sense to block such shortcut
> until we could implement the items you suggest.
>
>
>
> On Mon, Oct 12, 2015 at 3:42 PM, Jacques Nadeau <[email protected]>
> wrote:
> > I was recommending the following:
> >
> > - flatten is table function
> > - we should support it in traditional table function syntax
> > - we should support lateral join/lateral subqueries
> > - we have shortcut for a lateral join combined with a table function used
> > in the select clause
> >
> > The last item could be done as some kind of sql rewrite. Either add
> support
> > directly to Calcite or as an extension. We should have never forked
> Calcite
> > (my mistake) and need to get back onto master. I don't see why this
> > suggests otherwise.
> >
> >
> >
> >
> > --
> > Jacques Nadeau
> > CTO and Co-Founder, Dremio
> >
> > On Mon, Oct 12, 2015 at 3:27 PM, Jinfeng Ni <[email protected]>
> wrote:
> >
> >> The first question I have is what is flatten() function. Is it table
> >> functions or regular function?
> >>
> >> If Drill wants to model flatten() as table function, and want to fit
> >> flatten() into Calcite's table function structure, then I guess we had
> >> better add such restriction.
> >>
> >> Seems to me Julian's on-going work for table function is similar to
> >> what I understand for table function. 1) it appears in FROM clause, 2)
> >> As such, it would not nest a table function with a regular function.
> >>
> >> By "lateral join", are you referring to lateral subqueries [1]?
> >> Lateral subqueries seems to also appear in FROM clause.
> >>
> >> So, if we want Drill to allow the sugared syntax for flatten, but also
> >> want to use Calcite's UDTF, my sense is that it is yet another reason
> >> for us to use forked Calcite, simply because the way Drill wants is
> >> different from what normal SQL planner would do.
> >>
> >>
> >>
> >> [1].
> >>
> http://www.postgresql.org/docs/9.4/static/queries-table-expressions.html
> >>
> >> On Mon, Oct 12, 2015 at 2:57 PM, Jacques Nadeau <[email protected]>
> >> wrote:
> >> > It's funny, I've been looking at this stuff recently in the thinking
> >> about
> >> > trying to fit Flatten into Calcite's UDTF construct.
> >> >
> >> > My inclination is that we basically have sugared the syntax for a
> lateral
> >> > join with a udtf. I think this is a useful syntax when restricted to
> the
> >> > select clause and think we should make it more general to all udtf's.
> >> What
> >> > do others think? I think we should also add support for the more
> >> > traditional syntax structure.
> >> >
> >> > Have you looked at Calcite's planning of lateral join and UDTFs as we
> may
> >> > be able to push this sugared syntax into a clearer pattern.
> >> >
> >> > With regards to not allowing expression trees with flatten, I'm not
> sure
> >> > that is a required constraint.
> >> >
> >> > --
> >> > Jacques Nadeau
> >> > CTO and Co-Founder, Dremio
> >> >
> >> > On Mon, Oct 12, 2015 at 2:44 PM, Jinfeng Ni <[email protected]>
> >> wrote:
> >> >
> >> >> Drill currently supports flatten(), which flattens elements in a
> >> >> repeated field into a set of individual records [1].
> >> >>
> >> >> My understanding is that flatten is not part of SQL standard (SQL
> >> >> standard uses UNNEST). It's a concept borrowed from Google's
> >> >> BigQuery[2]. However, the way Drill uses flatten seems to be
> >> >> "different" from the way flatten is used in BiqQuery, or UNNEST in
> >> >> SQL.
> >> >>
> >> >> 1) In Drill, flatten is more like a regular function; it could appear
> >> >> in SELECT clause, ORDER BY, GROUP BY, but NOT in FROM clause.
> >> >>     In contrast, in BigQuery, flatten seems to only appear in the
> FROM
> >> >> clause, in a way more like table function.
> >> >>
> >> >> 2) Because Drill treats flatten like a regular function, it could be
> >> >> nested with a regular scalar function. I could not find such usage in
> >> >> BigQuery (Please point me an example if someone finds an example).
> >> >>
> >> >> As such, Drill currently allows the following:
> >> >>
> >> >> Assuming (T1 has one row only and T1.intList  : [ 1, 2, 3] )
> >> >>
> >> >> Q1:
> >> >> select flatten(T1.intList) + 1
> >> >> from
> >> >> T1;
> >> >>
> >> >> Would return 3 rows:
> >> >> 2,
> >> >> 3,
> >> >> 4
> >> >>
> >> >> To me, it is kind of unusual usage to 1) allow flatten to appear
> >> >> anywhere where a regular scalar function would appear, 2) allow
> >> >> flatten to be nested with regular scalar function.
> >> >>
> >> >> Such "flexibility"  probably has not been well tested. For instance,
> >> >> if I put flatten() in WHERE clause, Drill would throw function
> >> >> resolution error.
> >> >>
> >> >> Also, I'm not sure how Drill would do if we put flatten() nested with
> >> >> regular scalar function inside window aggregate function, with clause
> >> >> (CTE), etc?
> >> >>
> >> >> My question is : should Drill restrict flatten such that
> >> >>
> >> >> 1) it will only appear in SELECT clause for now (I understand it
> would
> >> >> require significant re-work if we want to use BiqQuery way and use
> >> >> flatten in FROM clause).
> >> >> 2) it would not allow flatten to be nested with regular scalar
> function.
> >> >>
> >> >> With such restriction, the above Q1 could be rewritten into:
> >> >>
> >> >> select  f + 1
> >> >> from ( select flatten(T1.intList) as f
> >> >>            from T1)
> >> >>
> >> >> I understand there is some discussion to improve flatten() in some
> >> >> JIRAs.  What does other think about adding this restriction to
> >> >> flatten?
> >> >>
> >> >>
> >> >> [1]. https://drill.apache.org/docs/flatten/
> >> >> [2]. https://cloud.google.com/bigquery/docs/data#flatten
> >> >>
> >>
>

Reply via email to