godfreyhe commented on code in PR #21789:
URL: https://github.com/apache/flink/pull/21789#discussion_r1101023309


##########
docs/content.zh/docs/connectors/table/hive/hive_functions.md:
##########
@@ -73,6 +73,34 @@ Some Hive built-in functions in older versions have [thread 
safety issues](https
 We recommend users patch their own Hive to fix them.
 {{< /hint >}}
 
+## Use Native Hive Aggregate Functions
+
+If [HiveModule]({{< ref "docs/dev/table/modules" >}}#hivemodule) is loaded 
with a higher priority than CoreModule, Flink will try to use the Hive built-in 
function first. And then for Hive built-in aggregation function,
+Flink currently uses sort-based aggregation strategy. Compared to hash-based 
aggregation strategy, the performance is one to two times worse, so from Flink 
1.17, we have implemented some of Hive's aggregation functions natively in 
Flink.

Review Comment:
   If no specific scenario is given, it is best not to give specific 
performance results here. 



##########
docs/content.zh/docs/connectors/table/hive/hive_functions.md:
##########
@@ -73,6 +73,34 @@ Some Hive built-in functions in older versions have [thread 
safety issues](https
 We recommend users patch their own Hive to fix them.
 {{< /hint >}}
 
+## Use Native Hive Aggregate Functions
+
+If [HiveModule]({{< ref "docs/dev/table/modules" >}}#hivemodule) is loaded 
with a higher priority than CoreModule, Flink will try to use the Hive built-in 
function first. And then for Hive built-in aggregation function,
+Flink currently uses sort-based aggregation strategy. Compared to hash-based 
aggregation strategy, the performance is one to two times worse, so from Flink 
1.17, we have implemented some of Hive's aggregation functions natively in 
Flink.
+These functions will use the hash-agg strategy to improve performance. 
Currently, only five functions are supported, namely sum/count/avg/min/max, and 
more aggregation functions will be supported in the future.

Review Comment:
   for the fix length agg buffer, we can use hash-agg, otherwise sort-agg will 
be chosed
   
   Another performance improvement is code gen



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to