[jira] [Updated] (FLINK-32780) Release Testing: Verify FLIP-324: Introduce Runtime Filter for Flink Batch Jobs

Lijie Wang (Jira) Fri, 25 Aug 2023 01:26:05 -0700


     [ 
https://issues.apache.org/jira/browse/FLINK-32780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Lijie Wang updated FLINK-32780:
-------------------------------
    Description: 
This issue aims to verify FLIP-324: 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-324%3A+Introduce+Runtime+Filter+for+Flink+Batch+Jobs

We can enable runtime filter by set: table.optimizer.runtime-filter.enabled: 
true

1. Create two tables, one small table (small amount of data), one large table 
(large amount of data), and then run join query on these two tables(such as the 
example in FLIP doc: SELECT * FROM fact, dim WHERE x = a AND z = 2). The Flink 
table planner should be able to obtain the statistical information of these two 
tables (for example, Hive table), and the data volume of the small table should 
be less than "table.optimizer.runtime-filter.max-build-data-size", and the data 
volume of the large table should be larger than 
"table.optimizer.runtime-filter.min-probe-data-size".

2. Show the plan of the join query. The plan should include nodes such as 
LocalRuntimeFilterBuilder, GlobalRuntimeFilterBuilder and RuntimeFilter. We can 
also verify plan for the various variants of above query.

3. Execute the above plan, and: 
* Check whether the data in the large table has been successfully filtered  
* Verify the execution result, the execution result should be same with the 
execution plan which disable runtime filter.

> Release Testing: Verify FLIP-324: Introduce Runtime Filter for Flink Batch 
> Jobs
> -------------------------------------------------------------------------------
>
>                 Key: FLINK-32780
>                 URL: https://issues.apache.org/jira/browse/FLINK-32780
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Tests
>    Affects Versions: 1.18.0
>            Reporter: Qingsheng Ren
>            Assignee: dalongliu
>            Priority: Major
>             Fix For: 1.18.0
>
>
> This issue aims to verify FLIP-324: 
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-324%3A+Introduce+Runtime+Filter+for+Flink+Batch+Jobs
> We can enable runtime filter by set: table.optimizer.runtime-filter.enabled: 
> true
> 1. Create two tables, one small table (small amount of data), one large table 
> (large amount of data), and then run join query on these two tables(such as 
> the example in FLIP doc: SELECT * FROM fact, dim WHERE x = a AND z = 2). The 
> Flink table planner should be able to obtain the statistical information of 
> these two tables (for example, Hive table), and the data volume of the small 
> table should be less than 
> "table.optimizer.runtime-filter.max-build-data-size", and the data volume of 
> the large table should be larger than 
> "table.optimizer.runtime-filter.min-probe-data-size".
> 2. Show the plan of the join query. The plan should include nodes such as 
> LocalRuntimeFilterBuilder, GlobalRuntimeFilterBuilder and RuntimeFilter. We 
> can also verify plan for the various variants of above query.
> 3. Execute the above plan, and: 
> * Check whether the data in the large table has been successfully filtered  
> * Verify the execution result, the execution result should be same with the 
> execution plan which disable runtime filter.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-32780) Release Testing: Verify FLIP-324: Introduce Runtime Filter for Flink Batch Jobs

Reply via email to