the-other-tim-brown commented on code in PR #13526:
URL: https://github.com/apache/hudi/pull/13526#discussion_r2231170119
##########
hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java:
##########
@@ -1499,21 +1503,21 @@ private static Schema getActualSchemaFromUnion(Schema
schema, Object data) {
public static HoodieRecord createHoodieRecordFromAvro(
IndexedRecord data,
String payloadClass,
- String preCombineField,
+ Option<String[]> preCombineFields,
Review Comment:
Should we have this be an empty array if there are no ordering fields?
##########
hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java:
##########
@@ -1659,6 +1666,11 @@ public static Comparable<?>
unwrapAvroValueWrapper(Object avroValueWrapper) {
return ((BytesWrapper) avroValueWrapper).getValue();
} else if (avroValueWrapper instanceof StringWrapper) {
return ((StringWrapper) avroValueWrapper).getValue();
+ } else if (avroValueWrapper instanceof ArrayWrapper) {
Review Comment:
The code still assumes that the caller will always want the array wrapper to
be converted to OrderingValues, is that safe? This is a generic util class so
it can be used from anywhere meaning it is not confined to ordering fields
usecases.
##########
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieParquetRealtimeInputFormat.java:
##########
@@ -115,10 +115,8 @@ void addProjectionToJobConf(final RealtimeSplit
realtimeSplit, final JobConf job
List<String> fieldsToAdd = new ArrayList<>();
if (!realtimeSplit.getDeltaLogPaths().isEmpty()) {
HoodieRealtimeInputFormatUtils.addVirtualKeysProjection(jobConf,
realtimeSplit.getVirtualKeyInfo());
- String preCombineKey = tableConfig.getPreCombineField();
- if (!StringUtils.isNullOrEmpty(preCombineKey)) {
- fieldsToAdd.add(preCombineKey);
- }
+ List<String> preCombineKeys = tableConfig.getPreCombineFields();
+
preCombineKeys.stream().filter(StringUtils::nonEmpty).forEach(fieldsToAdd::add);
Review Comment:
The table config should already be filtering out empty strings, right? I
think we can remove this check here
##########
hudi-common/src/test/java/org/apache/hudi/common/table/read/TestHoodieFileGroupReaderBase.java:
##########
@@ -183,6 +187,48 @@ public void
testReadFileGroupInMergeOnReadTable(RecordMergeMode recordMergeMode,
}
}
+ @Test
+ public void testReadFileGroupWithMultipleOrderingFields() throws Exception {
+ RecordMergeMode recordMergeMode = RecordMergeMode.EVENT_TIME_ORDERING;
+ Map<String, String> writeConfigs = new
HashMap<>(getCommonConfigs(recordMergeMode, true));
+ writeConfigs.put(HoodieStorageConfig.LOGFILE_DATA_BLOCK_FORMAT.key(),
"avro");
+ writeConfigs.put("hoodie.datasource.write.table.type",
HoodieTableType.MERGE_ON_READ.name());
+ // Use two precombine values - combination of timestamp and rider
+ String orderingValues = "timestamp,rider";
+ writeConfigs.put("hoodie.datasource.write.precombine.field",
orderingValues);
+ writeConfigs.put("hoodie.payload.ordering.field", orderingValues);
+
+ try (HoodieTestDataGenerator dataGen = new
HoodieTestDataGenerator(0xDEEF)) {
+ // Initial commit. rider column gets value of rider-002
+ List<HoodieRecord> initialRecords = dataGen.generateInserts("002", 100);
+ commitToTable(initialRecords, INSERT.value(), true, writeConfigs);
+ validateOutputFromFileGroupReader(
+ getStorageConf(), getBasePath(), true, 0, recordMergeMode,
+ initialRecords, initialRecords);
+
+ // The updates have rider values as rider-001 and the existing records
have rider values as rider-002
Review Comment:
Can you add a batch of timestamp 1 and rider-001 to show that the first
value ordering is working properly?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]