TheR1sing3un commented on code in PR #13360:
URL: https://github.com/apache/hudi/pull/13360#discussion_r2126140446
##########
hudi-spark-datasource/hudi-spark-common/src/main/java/org/apache/hudi/commit/DatasetBulkInsertCommitActionExecutor.java:
##########
@@ -47,47 +35,12 @@ public
DatasetBulkInsertCommitActionExecutor(HoodieWriteConfig config,
super(config, writeClient);
}
- @Override
- protected void preExecute() {
- instantTime = writeClient.startCommit();
- table = writeClient.initTable(getWriteOperationType(),
Option.ofNullable(instantTime));
- }
-
- @Override
- protected Option<HoodieData<WriteStatus>> doExecute(Dataset<Row> records,
boolean arePartitionRecordsSorted) {
- Map<String, String> opts =
writeConfig.getProps().entrySet().stream().collect(Collectors.toMap(
- e -> String.valueOf(e.getKey()),
- e -> String.valueOf(e.getValue())));
- Map<String, String> optsOverrides = Collections.singletonMap(
- HoodieInternalConfig.BULKINSERT_ARE_PARTITIONER_RECORDS_SORTED,
String.valueOf(arePartitionRecordsSorted));
-
- String targetFormat;
- Map<String, String> customOpts = new HashMap<>(1);
- if (HoodieSparkUtils.isSpark3()) {
- targetFormat = "org.apache.hudi.spark.internal";
-
customOpts.put(HoodieInternalConfig.BULKINSERT_INPUT_DATA_SCHEMA_DDL.key(),
records.schema().json());
- } else {
- throw new HoodieException("Bulk insert using row writer is not supported
with current Spark version."
- + " To use row writer please switch to spark 3");
- }
-
- records.write().format(targetFormat)
Review Comment:
> can you elaborate why logic is customized before for this executor?
I think the timeline is like this.
First, there was a normal bulk insert logic, and at that time, the interface
of data source v2 was directly used to perform writes.
Later, [boneanxs](https://github.com/boneanxs) proposed to use bulk insert
to perform other operations such as overwrite, but the code path was not
integrated at that time. Instead, the logic of this part was retained.
You can refer to: https://github.com/apache/hudi/pull/8076
<img width="1014" alt="image"
src="https://github.com/user-attachments/assets/fac2a5c0-e2b1-45f4-b1dd-00a47da2a9c1"
/>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]