[
https://issues.apache.org/jira/browse/FLINK-22853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17358501#comment-17358501
]
Jark Wu commented on FLINK-22853:
---------------------------------
Hi [~raypon.wang], I debugged the query and I think the result is as expected.
The reason is: you declared {{id}} as primary key, so the optimizer will remove
the useless Aggregate which groups by the primary key, and the optimized plan
is just a scan (see the explain result below).
{code}
== Abstract Syntax Tree ==
LogicalAggregate(group=[{0}], EXPR$1=[MAX($1)])
+- LogicalTableScan(table=[[default_catalog, default_database, test]])
== Optimized Physical Plan ==
TableSourceScan(table=[[default_catalog, default_database, test]], fields=[id,
offset])
== Optimized Execution Plan ==
TableSourceScan(table=[[default_catalog, default_database, test]], fields=[id,
offset])
{code}
Flink doesn't own data, so Flink just trust the primary key user given (that's
why the syntax is "NOT ENFORCED"). In your case, you gave a wrong primary key
information, where the {{id}} isn't PK in external system. Once you remove the
wrong primary key definition in Flink DDL, the result will be correct.
> Flink SQL functions max/min/sum return duplicated records
> ---------------------------------------------------------
>
> Key: FLINK-22853
> URL: https://issues.apache.org/jira/browse/FLINK-22853
> Project: Flink
> Issue Type: Bug
> Components: Connectors / JDBC, Table SQL / API
> Affects Versions: 1.12.1
> Reporter: Raypon Wang
> Priority: Major
> Attachments: image-2021-06-07-11-00-05-608.png,
> image-2021-06-07-11-00-49-109.png, image-2021-06-07-11-01-02-389.png
>
>
> mysql data:
> id offset
> 1 1
> 1 3
> 1 2
> flinksql code:(I used flink-connector-jdbc_2.12:1.12.1)
>
>
> object FlinkSqlOnJdbcForMysql {
> def main(args: Array[String]): Unit = {
> val settings =
> EnvironmentSettings.newInstance().useBlinkPlanner().inBatchMode().build()
> val tableEnvironment = TableEnvironment.create(settings)
> tableEnvironment.executeSql("" +
> "CREATE TABLE test (" +
> " id BIGINT," +
> " `offset` INT," +
> " PRIMARY KEY (id) NOT ENFORCED" +
> ") WITH (" +
> " 'connector' = 'jdbc'," +
> " 'driver' = 'com.mysql.cj.jdbc.Driver'," +
> " 'url' = 'jdbc:mysql://127.0.0.1:3306/test?&serverTimezone=Asia/Shanghai',"
> +
> " 'username' = 'root'," +
> " 'password' = 'Project.03'," +
> " 'table-name' = 'test'," +
> " 'scan.fetch-size' = '1000'," +
> " 'scan.auto-commit' = 'true'" +
> ")")
> tableEnvironment.executeSql(
> "select id,max(`offset`) from test group by id").print()
> }
> }
>
> result:
> +-----------------------+------------+
> |id|EXPR$1|
> +-----------------------+------------+
> |1|1|
> |1|3|
> |1|2|
> +-----------------------+------------+
> Result of max/min/sum is duplicated,
> but avg/count/last_value/first_value is ok.
>
>
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)