wuchong commented on code in PR #20419:
URL: https://github.com/apache/flink/pull/20419#discussion_r940136586
##########
flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/connectors/hive/HiveOptions.java:
##########
@@ -77,6 +77,19 @@ public class HiveOptions {
.withDescription(
"The thread number to split hive's partitions to
splits. It should be bigger than 0.");
+ public static final ConfigOption<Boolean>
TABLE_EXEC_HIVE_DYNAMIC_GROUPING_ENABLED =
+ key("table.exec.hive.dynamic-grouping.enabled")
Review Comment:
`table.exec.hive.sink.sort-by-dynamic-partition.enable`
##########
docs/content/docs/connectors/table/hive/hive_read_write.md:
##########
@@ -472,6 +472,14 @@ This configuration is set in the `TableConfig` and will
affect all sinks of the
</tbody>
</table>
+### Configuration for Dynamic Partition Inserting
+By default, if it's for dynamic partition inserting, Flink will sort the data
additionally by dynamic partition columns before writing into sink table.
Review Comment:
Add the following words:
That means the sink will receive all elements of one partition and then all
elements of another partition. Elements of different partitions will not be
mixed. This is helpful for Hive sink to reduce the number of partition writers
and improve writing performance by writing one partition at a time. Otherwise,
too many partition writers may cause the OutOfMemory exception.
##########
docs/content/docs/connectors/table/hive/hive_read_write.md:
##########
@@ -472,6 +472,14 @@ This configuration is set in the `TableConfig` and will
affect all sinks of the
</tbody>
</table>
+### Configuration for Dynamic Partition Inserting
Review Comment:
```suggestion
### Dynamic Partition Writing
```
##########
docs/content/docs/connectors/table/hive/hive_read_write.md:
##########
@@ -472,6 +472,14 @@ This configuration is set in the `TableConfig` and will
affect all sinks of the
</tbody>
</table>
+### Configuration for Dynamic Partition Inserting
+By default, if it's for dynamic partition inserting, Flink will sort the data
additionally by dynamic partition columns before writing into sink table.
+
+To avoid the extra sorting, you can set job configuration
`table.exec.hive.dynamic-grouping.enabled` (`true` by default) to `false`.
+But with such configuration, it'll throw OOM exception if there are too may
dynamic partitions.
+
Review Comment:
Add some hints about how to tune the dynamic partitioning. For example, add
`DISTRIBUTED BY <partition_fields>` for hash shuffling when data is not skewed.
You can also manually add `SORTED BY <partition_fields>` to achieve the same
purpose as `table.exec.hive.dynamic-grouping.enabled=true`.
##########
docs/content/docs/connectors/table/hive/hive_read_write.md:
##########
@@ -472,6 +472,14 @@ This configuration is set in the `TableConfig` and will
affect all sinks of the
</tbody>
</table>
+### Configuration for Dynamic Partition Inserting
Review Comment:
Introduce what's is dynamic partition writing at the beginning.
##########
docs/content/docs/connectors/table/hive/hive_read_write.md:
##########
@@ -472,6 +472,14 @@ This configuration is set in the `TableConfig` and will
affect all sinks of the
</tbody>
</table>
+### Configuration for Dynamic Partition Inserting
+By default, if it's for dynamic partition inserting, Flink will sort the data
additionally by dynamic partition columns before writing into sink table.
+
+To avoid the extra sorting, you can set job configuration
`table.exec.hive.dynamic-grouping.enabled` (`true` by default) to `false`.
+But with such configuration, it'll throw OOM exception if there are too may
dynamic partitions.
Review Comment:
```suggestion
But with such a configuration, it may throw OutOfMemory exception if there
are too many dynamic partitions.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]