lsyldliu commented on code in PR #21789:
URL: https://github.com/apache/flink/pull/21789#discussion_r1109562926
##########
docs/content/docs/connectors/table/hive/hive_functions.md:
##########
@@ -73,6 +73,35 @@ Some Hive built-in functions in older versions have [thread
safety issues](https
We recommend users patch their own Hive to fix them.
{{< /hint >}}
+## Use Native Hive Aggregate Functions
+
+If [HiveModule]({{< ref "docs/dev/table/modules" >}}#hivemodule) is loaded
with a higher priority than CoreModule, Flink will try to use the Hive built-in
function first. And then for Hive built-in aggregation function,
+Flink currently uses sort-based aggregation strategy. Compared to hash-based
aggregation strategy, the performance is worse, so from Flink 1.17, we have
implemented some of Hive's aggregation functions natively in Flink.
+
+These functions will use the hash-agg strategy and code gen for the
fixed-length aggregate buffer to improve performance. Otherwise, sort-agg
strategy will be chosen. Currently, only five functions are supported, namely
sum/count/avg/min/max,
+and more aggregation functions will be supported in the future. Users can use
the native aggregation function by turning on the option
`table.exec.hive.native-agg-function.enabled`, which brings significant
performance improvement to the job.
+
+<table class="table table-bordered">
+ <thead>
+ <tr>
+ <th class="text-left" style="width: 20%">Key</th>
+ <th class="text-left" style="width: 15%">Default</th>
+ <th class="text-left" style="width: 10%">Type</th>
+ <th class="text-left" style="width: 55%">Description</th>
+ </tr>
+ </thead>
+ <tbody>
+ <tr>
+ <td><h5>table.exec.hive.native-agg-function.enabled</h5></td>
+ <td style="word-wrap: break-word;">false</td>
+ <td>Boolean</td>
+ <td>Enabling to use native aggregate function which use hash-agg
strategy that can improve the aggregation performance after loading HiveModule.
This is a job-level option, user can enable it per-job.</td>
Review Comment:
Enabling to use native hive aggregate functions, Hash-agg strategy could be
used that can improve the aggregation performance . This is a job-level option
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]