godfreyhe commented on code in PR #21789:
URL: https://github.com/apache/flink/pull/21789#discussion_r1101023309
##########
docs/content.zh/docs/connectors/table/hive/hive_functions.md:
##########
@@ -73,6 +73,34 @@ Some Hive built-in functions in older versions have [thread
safety issues](https
We recommend users patch their own Hive to fix them.
{{< /hint >}}
+## Use Native Hive Aggregate Functions
+
+If [HiveModule]({{< ref "docs/dev/table/modules" >}}#hivemodule) is loaded
with a higher priority than CoreModule, Flink will try to use the Hive built-in
function first. And then for Hive built-in aggregation function,
+Flink currently uses sort-based aggregation strategy. Compared to hash-based
aggregation strategy, the performance is one to two times worse, so from Flink
1.17, we have implemented some of Hive's aggregation functions natively in
Flink.
Review Comment:
If no specific scenario is given, it is best not to give specific
performance results here.
##########
docs/content.zh/docs/connectors/table/hive/hive_functions.md:
##########
@@ -73,6 +73,34 @@ Some Hive built-in functions in older versions have [thread
safety issues](https
We recommend users patch their own Hive to fix them.
{{< /hint >}}
+## Use Native Hive Aggregate Functions
+
+If [HiveModule]({{< ref "docs/dev/table/modules" >}}#hivemodule) is loaded
with a higher priority than CoreModule, Flink will try to use the Hive built-in
function first. And then for Hive built-in aggregation function,
+Flink currently uses sort-based aggregation strategy. Compared to hash-based
aggregation strategy, the performance is one to two times worse, so from Flink
1.17, we have implemented some of Hive's aggregation functions natively in
Flink.
+These functions will use the hash-agg strategy to improve performance.
Currently, only five functions are supported, namely sum/count/avg/min/max, and
more aggregation functions will be supported in the future.
Review Comment:
for the fix length agg buffer, we can use hash-agg, otherwise sort-agg will
be chosed
Another performance improvement is code gen
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]