sum return duplicated records

Jark Wu (Jira) Mon, 07 Jun 2021 03:39:08 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-22853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17358501#comment-17358501
 ]


Jark Wu commented on FLINK-22853:
---------------------------------

Hi [~raypon.wang], I debugged the query and I think the result is as expected. 

The reason is: you declared {{id}} as primary key, so the optimizer will remove 
the useless Aggregate which  groups by the primary key, and the optimized plan 
is just a scan (see the explain result below). 

{code}
== Abstract Syntax Tree ==
LogicalAggregate(group=[{0}], EXPR$1=[MAX($1)])
+- LogicalTableScan(table=[[default_catalog, default_database, test]])

== Optimized Physical Plan ==
TableSourceScan(table=[[default_catalog, default_database, test]], fields=[id, 
offset])

== Optimized Execution Plan ==
TableSourceScan(table=[[default_catalog, default_database, test]], fields=[id, 
offset])
{code}

Flink doesn't own data, so Flink just trust the primary key user given (that's 
why the syntax is "NOT ENFORCED"). In your case, you gave a wrong primary key 
information, where the {{id}} isn't PK in external system. Once you remove the 
wrong primary key definition in Flink DDL, the result will be correct. 

> Flink SQL functions max/min/sum return duplicated records
> ---------------------------------------------------------
>
>                 Key: FLINK-22853
>                 URL: https://issues.apache.org/jira/browse/FLINK-22853
>             Project: Flink
>          Issue Type: Bug
>          Components: Connectors / JDBC, Table SQL / API
>    Affects Versions: 1.12.1
>            Reporter: Raypon Wang
>            Priority: Major
>         Attachments: image-2021-06-07-11-00-05-608.png, 
> image-2021-06-07-11-00-49-109.png, image-2021-06-07-11-01-02-389.png
>
>
> mysql data：
> id    offset
> 1      1
> 1      3
> 1      2
> flinksql code：(I used flink-connector-jdbc_2.12:1.12.1)
>  
>  
> object FlinkSqlOnJdbcForMysql {
>  def main(args: Array[String]): Unit = {
> val settings = 
> EnvironmentSettings.newInstance().useBlinkPlanner().inBatchMode().build()
>  val tableEnvironment = TableEnvironment.create(settings)
> tableEnvironment.executeSql("" +
>  "CREATE TABLE test (" +
>  " id BIGINT," +
>  " `offset` INT," +
>  " PRIMARY KEY (id) NOT ENFORCED" +
>  ") WITH (" +
>  " 'connector' = 'jdbc'," +
>  " 'driver' = 'com.mysql.cj.jdbc.Driver'," +
>  " 'url' = 'jdbc:mysql://127.0.0.1:3306/test?&serverTimezone=Asia/Shanghai'," 
> +
>  " 'username' = 'root'," +
>  " 'password' = 'Project.03'," +
>  " 'table-name' = 'test'," +
>  " 'scan.fetch-size' = '1000'," +
>  " 'scan.auto-commit' = 'true'" +
>  ")")
> tableEnvironment.executeSql(
>  "select id,max(`offset`) from test group by id").print()
>  }
> }
>  
> result:
> +-----------------------+------------+
> |id|EXPR$1|
> +-----------------------+------------+
> |1|1|
> |1|3|
> |1|2|
> +-----------------------+------------+
> Result of max/min/sum is duplicated,
> but avg/count/last_value/first_value is ok.
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-22853) Flink SQL functions max/min/sum return duplicated records

Reply via email to