YuweiXiao commented on a change in pull request #4441:
URL: https://github.com/apache/hudi/pull/4441#discussion_r814562736
##########
File path:
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/BulkInsertPartitioner.java
##########
@@ -18,24 +18,64 @@
package org.apache.hudi.table;
+import org.apache.hudi.common.fs.FSUtils;
+import org.apache.hudi.io.WriteHandleFactory;
+
+import java.io.Serializable;
+import java.util.List;
+import java.util.stream.Collectors;
+import java.util.stream.IntStream;
+
/**
* Repartition input records into at least expected number of output spark
partitions. It should give below guarantees -
* Output spark partition will have records from only one hoodie partition. -
Average records per output spark
* partitions should be almost equal to (#inputRecords /
#outputSparkPartitions) to avoid possible skews.
*/
-public interface BulkInsertPartitioner<I> {
+public abstract class BulkInsertPartitioner<I> implements Serializable {
+
+ private WriteHandleFactory defaultWriteHandleFactory;
+ private List<String> fileIdPfx;
Review comment:
The overall design indeed is partitionId -> fileIdPrefix
(`fileIdPfxList`), partitionId -> writeHandleFactory (`getWriteHandleFactory`
interface). Of course, will go with having those organized in the constructor,
which should make the design more clear.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]