lsyldliu commented on code in PR #21789:
URL: https://github.com/apache/flink/pull/21789#discussion_r1109562926


##########
docs/content/docs/connectors/table/hive/hive_functions.md:
##########
@@ -73,6 +73,35 @@ Some Hive built-in functions in older versions have [thread 
safety issues](https
 We recommend users patch their own Hive to fix them.
 {{< /hint >}}
 
+## Use Native Hive Aggregate Functions
+
+If [HiveModule]({{< ref "docs/dev/table/modules" >}}#hivemodule) is loaded 
with a higher priority than CoreModule, Flink will try to use the Hive built-in 
function first. And then for Hive built-in aggregation function,
+Flink currently uses sort-based aggregation strategy. Compared to hash-based 
aggregation strategy, the performance is worse, so from Flink 1.17, we have 
implemented some of Hive's aggregation functions natively in Flink.
+
+These functions will use the hash-agg strategy and code gen for the 
fixed-length aggregate buffer to improve performance. Otherwise, sort-agg 
strategy will be chosen. Currently, only five functions are supported, namely 
sum/count/avg/min/max, 
+and more aggregation functions will be supported in the future. Users can use 
the native aggregation function by turning on the option 
`table.exec.hive.native-agg-function.enabled`, which brings significant 
performance improvement to the job.
+
+<table class="table table-bordered">
+  <thead>
+    <tr>
+        <th class="text-left" style="width: 20%">Key</th>
+        <th class="text-left" style="width: 15%">Default</th>
+        <th class="text-left" style="width: 10%">Type</th>
+        <th class="text-left" style="width: 55%">Description</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+        <td><h5>table.exec.hive.native-agg-function.enabled</h5></td>
+        <td style="word-wrap: break-word;">false</td>
+        <td>Boolean</td>
+        <td>Enabling to use native aggregate function which use hash-agg 
strategy that can improve the aggregation performance after loading HiveModule. 
This is a job-level option, user can enable it per-job.</td>

Review Comment:
   Enabling to use native hive aggregate functions, Hash-agg strategy could be 
used that can improve the aggregation performance . This is a job-level option



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to