[ 
https://issues.apache.org/jira/browse/FLINK-6428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16358201#comment-16358201
 ] 

Fabian Hueske commented on FLINK-6428:
--------------------------------------

Hi [~lynchlee]

The query {{SELECT DISTINCT a, b, c FROM t GROUP BY a}} is not working because 
it is ill-defined.
What should be the result of the query for the following input?

{code}
a | b | c
--
1 | 1 | 1
1 | 2 | 2
{code} 

Clearly, we can only return a single row, because we group on {{a}} and there 
is only one distinct value for {{a}}. But which values should be returned for 
{{b}} and {{c}} in that row?
We cannot return all values, so we have to pick one. That's an arbitrary choice 
and hence a random result. 
Apache Calcite (which Flink uses as a SQL parser and optimizer) does not 
support it and IMO that's correct.


> Add support DISTINCT in dataStream SQL
> --------------------------------------
>
>                 Key: FLINK-6428
>                 URL: https://issues.apache.org/jira/browse/FLINK-6428
>             Project: Flink
>          Issue Type: New Feature
>          Components: Table API & SQL
>            Reporter: sunjincheng
>            Assignee: sunjincheng
>            Priority: Major
>
> Add support DISTINCT in dataStream SQL as follow:
> DATA:
> {code}
> (name, age)
> (kevin, 28),
> (sunny, 6),
> (jack, 6)
> {code}
> SQL:
> {code}
> SELECT DISTINCT age FROM MyTable"
> {code}
> RESULTS:
> {code}
> 28, 6
> {code}
> To DataStream:
> {code}
> inputDS
>   .keyBy() // KeyBy on all fields
>   .flatMap() //  Eliminate duplicate data
> {code}
> [~fhueske] do we need this feature?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to