hudi-agent commented on code in PR #18384:
URL: https://github.com/apache/hudi/pull/18384#discussion_r3277658155
##########
hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieAvroFileWriter.java:
##########
@@ -56,4 +57,21 @@ default void prepRecordWithMetadata(HoodieKey key,
IndexedRecord avroRecord, Str
HoodieAvroUtils.addHoodieKeyToRecord((GenericRecord) avroRecord,
key.getRecordKey(), key.getPartitionPath(), fileName);
HoodieAvroUtils.addCommitMetadataToRecord((GenericRecord) avroRecord,
instantTime, seqId);
}
+
+ /**
+ * Selectively populates meta fields based on the provided {@link
HoodieMetaFieldFlags} flags.
+ * Fields that are not populated are explicitly set to null so that any
pre-existing values on
+ * {@code avroRecord} are cleared and the on-disk output is deterministic.
+ */
+ default void prepRecordWithMetadata(HoodieKey key, IndexedRecord avroRecord,
String instantTime,
+ Integer partitionId, long recordIndex, String fileName,
HoodieMetaFieldFlags populateFlags) {
+ GenericRecord rec = (GenericRecord) avroRecord;
+ String seqId = populateFlags.isCommitSeqNoPopulated()
+ ? HoodieRecord.generateSequenceId(instantTime, partitionId,
recordIndex) : null;
+ rec.put(HoodieRecord.COMMIT_TIME_METADATA_FIELD,
populateFlags.getCommitTime(instantTime));
Review Comment:
🤖 nit: `populateFlags.getCommitTime(instantTime)` reads like a plain getter
but actually returns either `instantTime` or `null` depending on the flag.
Consider renaming these helpers (e.g. `commitTimeOrNull`, `recordKeyOrNull`,
...) so the conditional semantics are obvious at the call site — or just inline
the ternary here to match how `seqId` is handled two lines above.
<sub><i>- AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAbstractMergeHandle.java:
##########
@@ -143,8 +144,17 @@ private void initWriteStatus(String fileId, String
partitionPath) {
setWriteStatusPath();
}
- private void validateAndSetAndKeyGenProps(Option<BaseKeyGenerator>
keyGeneratorOpt, boolean populateMetaFields) {
- ValidationUtils.checkArgument(populateMetaFields ==
!keyGeneratorOpt.isPresent());
+ private void validateAndSetAndKeyGenProps(Option<BaseKeyGenerator>
keyGeneratorOpt,
+ boolean keyGeneratorRequired) {
+ // Invariant: the merge handle reconstructs the record key and/or
partition path of old
+ // records via the supplied BaseKeyGenerator when those meta columns are
not populated
+ // on disk - either populate.meta.fields=false or the field is in
META_FIELDS_EXCLUDE_LIST.
+ // When both columns are populated, the values come from the base file and
no key
+ // generator is needed.
+ ValidationUtils.checkArgument(keyGeneratorRequired ==
keyGeneratorOpt.isPresent(),
+ "Merge handle keyGenerator presence must mirror meta-field population:
"
+ + "keyGeneratorRequired=" + keyGeneratorRequired
Review Comment:
🤖 This assertion is strictly stronger than the previous `populateMetaFields
== !keyGeneratorOpt.isPresent()` check in the new case where
`populateMetaFields=true` AND a meta column is in `META_FIELDS_EXCLUDE_LIST`
(so `keyGeneratorRequired=true`). Any merge-handle constructor call site that
still passes `Option.empty()` for `keyGeneratorOpt` (e.g. compaction/clustering
paths not shown in this chunk) will now fail this `checkArgument` at runtime as
soon as a user enables exclusion. Have all merge-handle creation sites been
audited and updated to source `keyGeneratorOpt` from
`isKeyGeneratorRequired()`? @yihua could you confirm coverage of the compaction
and clustering execution paths before this lands?
<sub><i>- AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
##########
hudi-client/hudi-client-common/src/test/java/org/apache/hudi/avro/TestHoodieAvroParquetWriter.java:
##########
@@ -75,7 +75,8 @@ public void testProperWriting() throws IOException {
StoragePath filePath = new
StoragePath(tmpDir.resolve("test.parquet").toAbsolutePath().toString());
try (HoodieAvroParquetWriter writer =
- new HoodieAvroParquetWriter(filePath, parquetConfig, "001", new
LocalTaskContextSupplier(), true)) {
+ new HoodieAvroParquetWriter(filePath, parquetConfig, "001", new
LocalTaskContextSupplier(), true,
+
org.apache.hudi.common.model.HoodieMetaFieldFlags.allPopulated())) {
Review Comment:
🤖 nit: could you add an import for `HoodieMetaFieldFlags` instead of using
the fully qualified name? Same fix applies to `HoodieWriteableTestTable.java`,
`TestHoodieOrcReaderWriter.java`, and
`TestSparkBinaryCopyClusteringAndValidationMeta.java`, which all spell out
`org.apache.hudi.common.model.HoodieMetaFieldFlags.allPopulated()` inline.
<sub><i>- AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]