xushiyan commented on code in PR #5629:
URL: https://github.com/apache/hudi/pull/5629#discussion_r909038899
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java:
##########
@@ -142,6 +146,11 @@ public class HoodieWriteConfig extends HoodieConfig {
.withDocumentation("Easily configure one the built-in key generators,
instead of specifying the key generator class."
+ "Currently supports SIMPLE, COMPLEX, TIMESTAMP, CUSTOM,
NON_PARTITION, GLOBAL_DELETE");
+ public static final ConfigProperty<String> RECORD_TYPE = ConfigProperty
+ .key("hoodie.datasource.write.record.type")
+ .defaultValue(HoodieRecordType.AVRO.toString())
+ .withDocumentation("test");
Review Comment:
reminder to fix the doc
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/HoodieDeleteHelper.java:
##########
@@ -84,8 +87,19 @@ public HoodieWriteMetadata<HoodieData<WriteStatus>>
execute(String instantTime,
dedupedKeys = keys.repartition(parallelism);
}
- HoodieData<HoodieRecord<T>> dedupedRecords =
- dedupedKeys.map(key -> new HoodieAvroRecord(key, new
EmptyHoodieRecordPayload()));
+ HoodieData dedupedRecords;
+ if (config.getRecordType() == HoodieRecordType.AVRO) {
+ dedupedRecords =
+ dedupedKeys.map(key -> new HoodieAvroRecord(key, new
EmptyHoodieRecordPayload()));
+ } else if (config.getRecordType() == HoodieRecordType.SPARK) {
+ dedupedRecords = dedupedKeys.map(key -> {
+ Class<?> recordClazz =
ReflectionUtils.getClass("org.apache.hudi.commmon.model.HoodieSparkRecord");
+ Method method = recordClazz.getMethod("empty", HoodieKey.class);
+ return method.invoke(null, key);
Review Comment:
this reflection API call is gonna impact perf to quite some extent
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieIndexUtils.java:
##########
@@ -132,7 +133,8 @@ public static List<String> filterKeysFromFile(Path
filePath, List<String> candid
// Load all rowKeys from the file, to double-confirm
if (!candidateRecordKeys.isEmpty()) {
HoodieTimer timer = new HoodieTimer().startTimer();
- HoodieAvroFileReader fileReader =
HoodieFileReaderFactory.getFileReader(configuration, filePath);
+ // Just get row keys
Review Comment:
this comment looks redundant
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java:
##########
@@ -142,6 +146,11 @@ public class HoodieWriteConfig extends HoodieConfig {
.withDocumentation("Easily configure one the built-in key generators,
instead of specifying the key generator class."
+ "Currently supports SIMPLE, COMPLEX, TIMESTAMP, CUSTOM,
NON_PARTITION, GLOBAL_DELETE");
+ public static final ConfigProperty<String> RECORD_TYPE = ConfigProperty
+ .key("hoodie.datasource.write.record.type")
+ .defaultValue(HoodieRecordType.AVRO.toString())
+ .withDocumentation("test");
Review Comment:
state valid values for this config?
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java:
##########
@@ -977,6 +988,16 @@ public String getKeyGeneratorClass() {
return getString(KEYGENERATOR_CLASS_NAME);
}
+ public HoodieRecord.HoodieRecordType getRecordType() {
+ HoodieRecordType recordType =
HoodieRecord.HoodieRecordType.valueOf(getString(RECORD_TYPE));
+ String basePath = getString(BASE_PATH);
+ boolean metadataTable = HoodieTableMetadata.isMetadataTable(basePath);
+ if (metadataTable) {
+ recordType = HoodieRecordType.AVRO;
+ }
+ return recordType;
Review Comment:
looks like this `getRecordType()` is invoked everywhere throughout the write
path. we should optimize the perf here somehow. for e.g., the ENUM
instantiation , metadata table check. since this info is not gonna change
throughout a write operation, we should consider cache it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]