[ https://issues.apache.org/jira/browse/FLINK-31060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yunhong Zheng updated FLINK-31060: ---------------------------------- Description: This issue aims to verify FLINK-30542: Support adaptive local hash aggregate in runtime. Adaptive local hash aggregation is an optimization of local hash aggregation, which can adaptively determine whether to continue to do local hash aggregation according to the distinct value rate of sampling data. If distinct value rate bigger than defined threshold (see parameter: 'table.exec.local-hash-agg.adaptive.distinct-value-rate-threshold'), we will stop aggregating and just send the input data to the downstream after a simple projection. Otherwise, we will continue to do aggregation. We can verify it in SQL client after we build the flink-dist package. # Create a source table firstly. (Note: the source table need have different degree of aggregation, means the distinct count can be controlled by source connector, we recommend to modify dataGen table source to produce different data with different distinct row number). # Verify the result with different distinct value rate. (See: table.exec.local-hash-agg.adaptive.distinct-value-rate-threshold) # Check the log in 'TM' to see whether the adaptive local hash aggregate works. If you meet any problems, it's welcome to ping me directly. was:This issue aims to verify FLINK-30542: [Support adaptive local hash aggregate in runtime|https://issues.apache.org/jira/browse/FLINK-30542]. > Release Testing: Verify FLINK-30542 Support adaptive local hash aggregate in > runtime > ------------------------------------------------------------------------------------ > > Key: FLINK-31060 > URL: https://issues.apache.org/jira/browse/FLINK-31060 > Project: Flink > Issue Type: Sub-task > Components: Table SQL / Runtime > Affects Versions: 1.17.0 > Reporter: Yunhong Zheng > Priority: Major > Fix For: 1.17.0 > > > This issue aims to verify FLINK-30542: Support adaptive local hash aggregate > in runtime. > Adaptive local hash aggregation is an optimization of local hash aggregation, > which can adaptively determine whether to continue to do local hash > aggregation according to the distinct value rate of sampling data. If > distinct value rate bigger than defined threshold (see parameter: > 'table.exec.local-hash-agg.adaptive.distinct-value-rate-threshold'), we will > stop aggregating and just send the input data to the downstream after a > simple projection. Otherwise, we will continue to do aggregation. > We can verify it in SQL client after we build the flink-dist package. > # Create a source table firstly. (Note: the source table need have different > degree of aggregation, means the distinct count can be controlled by source > connector, we recommend to modify dataGen table source to produce different > data with different distinct row number). > # Verify the result with different distinct value rate. (See: > table.exec.local-hash-agg.adaptive.distinct-value-rate-threshold) > # Check the log in 'TM' to see whether the adaptive local hash aggregate > works. > If you meet any problems, it's welcome to ping me directly. -- This message was sent by Atlassian Jira (v8.20.10#820010)