[
https://issues.apache.org/jira/browse/FLINK-6101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15933763#comment-15933763
]
Fabian Hueske commented on FLINK-6101:
--------------------------------------
I agree, it would be good to fix this.
IMO, it would be nice to fix this without requiring an alias. However, it might
also be handy to use an alias to avoid a repeated specification of the
expression in {{select()}}.
Maybe this can be implemented by using an internal alias that can be
overwritten by an alias in {{groupBy()}}.
> GroupBy fields with arithmetic expression (include UDF) can not be selected
> ---------------------------------------------------------------------------
>
> Key: FLINK-6101
> URL: https://issues.apache.org/jira/browse/FLINK-6101
> Project: Flink
> Issue Type: Bug
> Components: Table API & SQL
> Reporter: lincoln.lee
> Fix For: 1.3.0
>
>
> currently the TableAPI do not support selecting GroupBy fields with
> expression either using original field name or the expression
> {code}
> val t = CollectionDataSets.get5TupleDataSet(env).toTable(tEnv, 'a, 'b, 'c,
> 'd, 'e)
> .groupBy('e, 'b % 3)
> .select('b, 'c.min, 'e, 'a.avg, 'd.count)
> {code}
> caused
> {code}
> org.apache.flink.table.api.ValidationException: Cannot resolve [b] given
> input [e, ('b % 3), TMP_0, TMP_1, TMP_2].
> {code}
> (BTW, this syntax is invalid in RDBMS which will indicate the selected column
> is invalid in the select list because it is not contained in either an
> aggregate function or the GROUP BY clause in SQL Server.)
> and
> {code}
> val t = CollectionDataSets.get5TupleDataSet(env).toTable(tEnv, 'a, 'b, 'c,
> 'd, 'e)
> .groupBy('e, 'b % 3)
> .select('b%3, 'c.min, 'e, 'a.avg, 'd.count)
> {code}
> will also cause
> {code}
> org.apache.flink.table.api.ValidationException: Cannot resolve [b] given
> input [e, ('b % 3), TMP_0, TMP_1, TMP_2].
> {code}
> and add an alias in groupBy clause "group(e, 'b%3 as 'b)" work without avail.
> and apply an UDF doesn’t work either
> {code}
> table.groupBy('a, Mod('b, 3)).select('a, Mod('b, 3), 'c.count, 'c.count,
> 'd.count, 'e.avg)
> org.apache.flink.table.api.ValidationException: Cannot resolve [b] given
> input [a, org.apache.flink.table.api.scala.batch.table.Mod$('b, 3), TMP_0,
> TMP_1, TMP_2].
> {code}
> the only way to get this work can be
> {code}
> val t = CollectionDataSets.get5TupleDataSet(env).toTable(tEnv, 'a, 'b, 'c,
> 'd, 'e)
> .select('a, 'b%3 as 'b, 'c, 'd, 'e)
> .groupBy('e, 'b)
> .select('b, 'c.min, 'e, 'a.avg, 'd.count)
> {code}
> One way to solve this is to add support alias in groupBy clause ( it seems a
> bit odd against SQL though TableAPI has a different groupBy grammar),
> and I prefer to support select original expressions and UDF in groupBy
> clause(make consistent with SQL).
> as thus:
> {code}
> // use expression
> val t = CollectionDataSets.get5TupleDataSet(env).toTable(tEnv, 'a, 'b, 'c,
> 'd, 'e)
> .groupBy('e, 'b % 3)
> .select('b % 3, 'c.min, 'e, 'a.avg, 'd.count)
> // use UDF
> val t = CollectionDataSets.get5TupleDataSet(env).toTable(tEnv, 'a, 'b, 'c,
> 'd, 'e)
> .groupBy('e, Mod('b,3))
> .select(Mod('b,3), 'c.min, 'e, 'a.avg, 'd.count)
> {code}
> After had a look into the code, found there was a problem in the groupBy
> implementation, validation hadn't considered the expressions in groupBy
> clause. it should be noted that a table has been actually changed after
> groupBy operation ( a new Table) and the groupBy keys replace the original
> field reference in essence.
>
> What do you think?
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)