[jira] [Commented] (FLINK-34378) Minibatch join disrupted the original order of input records

Jeyhun Karimov (Jira) Fri, 09 Feb 2024 15:52:31 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-34378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17816252#comment-17816252
 ]


Jeyhun Karimov commented on FLINK-34378:
----------------------------------------

Hi [~xuyangzhong] the ordering is different even with parallelism 1 because of 
{{Set}} in {{MiniBatch}} operator. IMO this is expected behavior.  

> Minibatch join disrupted the original order of input records
> ------------------------------------------------------------
>
>                 Key: FLINK-34378
>                 URL: https://issues.apache.org/jira/browse/FLINK-34378
>             Project: Flink
>          Issue Type: Technical Debt
>          Components: Table SQL / Runtime
>    Affects Versions: 1.19.0
>            Reporter: xuyang
>            Priority: Major
>             Fix For: 1.19.0
>
>
> I'm not sure if it's a bug. The following case can re-produce this situation.
> {code:java}
> // add it in CalcITCase
> @Test
> def test(): Unit = {
>   env.setParallelism(1)
>   val rows = Seq(
>     row(1, "1"),
>     row(2, "2"),
>     row(3, "3"),
>     row(4, "4"),
>     row(5, "5"),
>     row(6, "6"),
>     row(7, "7"),
>     row(8, "8"))
>   val dataId = TestValuesTableFactory.registerData(rows)
>   val ddl =
>     s"""
>        |CREATE TABLE t1 (
>        |  a int,
>        |  b string
>        |) WITH (
>        |  'connector' = 'values',
>        |  'data-id' = '$dataId',
>        |  'bounded' = 'false'
>        |)
>      """.stripMargin
>   tEnv.executeSql(ddl)
>   val ddl2 =
>     s"""
>        |CREATE TABLE t2 (
>        |  a int,
>        |  b string
>        |) WITH (
>        |  'connector' = 'values',
>        |  'data-id' = '$dataId',
>        |  'bounded' = 'false'
>        |)
>      """.stripMargin
>   tEnv.executeSql(ddl2)
>   tEnv.getConfig.getConfiguration
>     .set(ExecutionConfigOptions.TABLE_EXEC_MINIBATCH_ENABLED, 
> Boolean.box(true))
>   tEnv.getConfig.getConfiguration
>     .set(ExecutionConfigOptions.TABLE_EXEC_MINIBATCH_ALLOW_LATENCY, 
> Duration.ofSeconds(5))
>   tEnv.getConfig.getConfiguration
>     .set(ExecutionConfigOptions.TABLE_EXEC_MINIBATCH_SIZE, Long.box(20L))
>   println(tEnv.sqlQuery("SELECT * from t1 join t2 on t1.a = t2.a").explain())
>   tEnv.executeSql("SELECT * from t1 join t2 on t1.a = t2.a").print()
> }{code}
> Result
> {code:java}
> +----+---+---+---+---+ 
> | op | a | b | a0| b0| 
> +----+---+---+---+---+ 
> | +I | 3 | 3 | 3 | 3 | 
> | +I | 7 | 7 | 7 | 7 | 
> | +I | 2 | 2 | 2 | 2 | 
> | +I | 5 | 5 | 5 | 5 | 
> | +I | 1 | 1 | 1 | 1 | 
> | +I | 6 | 6 | 6 | 6 | 
> | +I | 4 | 4 | 4 | 4 | 
> | +I | 8 | 8 | 8 | 8 | 
> +----+---+---+---+---+
> {code}
> When I do not use minibatch join, the result is :
> {code:java}
> +----+---+---+----+----+
> | op | a | b | a0 | b0 |
> +----+---+---+----+----+
> | +I | 1 | 1 |  1 |  1 |
> | +I | 2 | 2 |  2 |  2 |
> | +I | 3 | 3 |  3 |  3 |
> | +I | 4 | 4 |  4 |  4 |
> | +I | 5 | 5 |  5 |  5 |
> | +I | 6 | 6 |  6 |  6 |
> | +I | 7 | 7 |  7 |  7 |
> | +I | 8 | 8 |  8 |  8 |
> +----+---+---+----+----+
>  {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-34378) Minibatch join disrupted the original order of input records

Reply via email to