godfreyhe commented on code in PR #20694:
URL: https://github.com/apache/flink/pull/20694#discussion_r975070878
##########
docs/content/docs/connectors/table/hive/hive_read_write.md:
##########
@@ -198,8 +198,13 @@ Users can do some performance tuning by tuning the split's
size with the follow
</tr>
</tbody>
</table>
-
-**NOTE**: Currently, these two configurations only works for the Hive table
stored as ORC format.
+{{< hint warning >}}
+**NOTE**:
+- To tune the split's size, Flink will first get all files' size for all
partitions.
+ If there are too many partitions, it maybe time-consuming,
+ then you can configure the job configuration
`table.exec.calculate-partition-size.thread-num` (3 by default) to a bigger
value to enable more threads to speed the process.
Review Comment:
speed up
##########
flink-connectors/flink-connector-hive/src/test/java/org/apache/flink/connectors/hive/HiveSourceFileEnumeratorTest.java:
##########
@@ -72,6 +74,7 @@ public void testCreateInputSplits() throws Exception {
// set split max size and verify it works
jobConf = new JobConf();
+
jobConf.set(HiveOptions.TABLE_EXEC_HIVE_CALCULATE_PARTITION_SIZE_THREAD_NUM.key(),
"1");
Review Comment:
do we have any tests to verify the multiple threads with multiple partitions
?
##########
flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/connectors/hive/HiveOptions.java:
##########
@@ -96,6 +96,12 @@ public class HiveOptions {
+ " When the value is over estimated,
Flink will tend to pack Hive's data into less splits, which will be helpful
when Hive's table contains many small files."
+ " And vice versa. It only works for the
Hive table stored as ORC format.");
+ public static final ConfigOption<Integer>
TABLE_EXEC_HIVE_CALCULATE_PARTITION_SIZE_THREAD_NUM =
+ key("table.exec.calculate-partition-size.thread-num")
Review Comment:
table.exec.hive.calculate-partition-size.thread-num
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]