hudi-agent commented on code in PR #19047:
URL: https://github.com/apache/hudi/pull/19047#discussion_r3461659886
##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/MergeOnReadIncrementalRelationV1.scala:
##########
@@ -267,8 +267,9 @@ trait HoodieIncrementalRelationV1Trait extends
HoodieBaseRelation {
s"option ${DataSourceReadOptions.START_COMMIT.key}")
}
- if (!this.tableConfig.populateMetaFields()) {
- throw new HoodieException("Incremental queries are not supported when
meta fields are disabled")
+ if (!this.tableConfig.isCommitTimePopulated()) {
Review Comment:
🤖 This also enables incremental queries on MoR tables in COMMIT_TIME_ONLY
mode, but the MoR log-write path (`HoodieAppendHandle.populateMetadataFields`)
still only writes `_hoodie_commit_time` when `populateMetaFields()` is true —
so log-file records get a null commit-time and would be silently dropped by the
`IsNotNull`/range filters in `incrementalSpanRecordFilters` (and the merge also
needs `_hoodie_record_key`, which is null in this mode). Should the mode be
restricted to CoW, or should the append handle populate commit-time too? Same
applies to V2. @yihua does this match your read of the MoR incremental path?
<sub><i>⚠️ AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
##########
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java:
##########
@@ -324,6 +324,14 @@ public static final String getDefaultPayloadClassName() {
.withDocumentation("When enabled, populates all meta fields. When
disabled, no meta fields are populated "
+ "and incremental queries will not be functional. This is only
meant to be used for append only/immutable data for batch processing");
+ public static final ConfigProperty<Boolean> META_FIELDS_COMMIT_TIME_ENABLED
= ConfigProperty
Review Comment:
🤖 Is enabling this flag on a pre-existing `populate.meta.fields=false` table
meant to be guarded? `HoodieWriterUtils.validateTableConfig` only flags a
conflict when the existing value is non-null, so a null→true transition on an
older NONE table isn't blocked. After that, incremental queries over a range
that includes pre-enablement commits would silently drop those rows since their
`_hoodie_commit_time` is null on disk.
<sub><i>⚠️ AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
##########
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java:
##########
@@ -324,6 +324,14 @@ public static final String getDefaultPayloadClassName() {
.withDocumentation("When enabled, populates all meta fields. When
disabled, no meta fields are populated "
+ "and incremental queries will not be functional. This is only
meant to be used for append only/immutable data for batch processing");
+ public static final ConfigProperty<Boolean> META_FIELDS_COMMIT_TIME_ENABLED
= ConfigProperty
+ .key("hoodie.meta.fields.commit.time.enabled")
+ .defaultValue(false)
+ .withDocumentation("Only meaningful when " +
"hoodie.populate.meta.fields=false. When set to true, the writer "
Review Comment:
🤖 nit: the `"Only meaningful when " +` splits a single thought across two
adjacent string literals for no reason — could you merge them into one string
so the concatenation isn't surprising to whoever reads this next?
<sub><i>⚠️ AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java:
##########
@@ -3861,6 +3884,17 @@ private void validate() {
checkArgument(lookbackCommits >= 0,
String.format("%s must be non-negative, but was %d",
ROLLING_METADATA_TIMELINE_LOOKBACK_COMMITS.key(),
lookbackCommits));
+
+ // COMMIT_TIME_ONLY mode is an additive opt-in on top of
populate.meta.fields=false. Setting
+ // both flags to true is ambiguous (the COMMIT_TIME_ONLY flag has no
effect when all meta
+ // fields are already populated) so reject it explicitly rather than
silently ignore.
+ boolean populateMetaFields =
writeConfig.getBooleanOrDefault(HoodieTableConfig.POPULATE_META_FIELDS);
Review Comment:
🤖 nit: `writeConfig.populateMetaFields()` already encodes this read — could
you use it here instead of reaching through `getBooleanOrDefault` directly?
<sub><i>⚠️ AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
##########
hudi-client/hudi-client-common/src/test/java/org/apache/hudi/config/TestHoodieWriteConfigMetaFieldsMode.java:
##########
@@ -0,0 +1,109 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.config;
+
+import org.junit.jupiter.api.Test;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+import static org.junit.jupiter.api.Assertions.assertFalse;
+import static org.junit.jupiter.api.Assertions.assertThrows;
+import static org.junit.jupiter.api.Assertions.assertTrue;
+
+/**
+ * Validates the writer-side accessors and validation guard for the three
meta-field-population
+ * modes (ALL / NONE / COMMIT_TIME_ONLY) on {@link HoodieWriteConfig}. The
companion test for the
+ * {@link org.apache.hudi.common.table.HoodieTableConfig} accessors lives in
{@code
+ * TestHoodieMetaFieldsMode}; this test covers the writer-builder surface and
the cross-flag
+ * validation that runs at {@code build()} time.
+ */
+class TestHoodieWriteConfigMetaFieldsMode {
+
+ private static HoodieWriteConfig.Builder baseBuilder() {
+ return
HoodieWriteConfig.newBuilder().withPath("file:///tmp/test_hudi_meta_fields_mode");
+ }
+
+ @Test
+ void defaultsToAllMode() {
+ HoodieWriteConfig cfg = baseBuilder().build();
+ assertTrue(cfg.populateMetaFields());
+ assertFalse(cfg.isCommitTimeOnlyMetaFieldsMode());
+ assertTrue(cfg.isCommitTimePopulated());
+ }
+
+ @Test
+ void explicitNoneModeBuilds() {
+ HoodieWriteConfig cfg =
baseBuilder().withPopulateMetaFields(false).build();
+ assertFalse(cfg.populateMetaFields());
+ assertFalse(cfg.isCommitTimeOnlyMetaFieldsMode());
+ assertFalse(cfg.isCommitTimePopulated());
+ }
+
+ @Test
+ void commitTimeOnlyModeBuildsAndIsAdditiveOverNone() {
+ HoodieWriteConfig cfg = baseBuilder()
+ .withPopulateMetaFields(false)
+ .withMetaFieldsCommitTimeEnabled(true)
+ .build();
+ assertFalse(cfg.populateMetaFields());
+ assertTrue(cfg.isCommitTimeOnlyMetaFieldsMode(),
+ "COMMIT_TIME_ONLY must engage when populate.meta.fields=false and
commit.time.enabled=true");
+ assertTrue(cfg.isCommitTimePopulated(),
+ "incremental query semantics depend on _hoodie_commit_time being
populated in this mode");
+ }
+
+ @Test
+ void commitTimeFlagIgnoredWhenPopulateMetaFieldsIsTrueByConstruction() {
+ // The validation guard rejects the combination at build() time, so the
only way to observe
+ // both flags being truthy together is to bypass the builder's validate()
— i.e., the
+ // accessor must still report ALL semantics rather than COMMIT_TIME_ONLY
if the bad combo
+ // ever leaks through (defensive correctness).
+ HoodieWriteConfig cfg = baseBuilder()
+ .withPopulateMetaFields(true)
+ // commit.time.enabled defaults to false; setting it false is allowed.
+ .withMetaFieldsCommitTimeEnabled(false)
+ .build();
+ assertTrue(cfg.populateMetaFields());
+ assertFalse(cfg.isCommitTimeOnlyMetaFieldsMode());
+ }
+
+ @Test
+ void rejectsIncompatibleCombination() {
+ // populate.meta.fields=true together with the COMMIT_TIME_ONLY opt-in is
ambiguous (the new
+ // flag has no effect when all meta fields are already populated); reject
loudly.
+ HoodieWriteConfig.Builder builder = baseBuilder()
+ .withPopulateMetaFields(true)
+ .withMetaFieldsCommitTimeEnabled(true);
+ IllegalArgumentException ex = assertThrows(IllegalArgumentException.class,
builder::build);
+
assertTrue(ex.getMessage().contains("hoodie.meta.fields.commit.time.enabled"),
+ "exception must name the new property: " + ex.getMessage());
+ assertTrue(ex.getMessage().contains("hoodie.populate.meta.fields"),
+ "exception must name the legacy property too: " + ex.getMessage());
+ }
+
+ @Test
+ void noneModeWithCommitTimeFlagFalseExplicitIsStillNone() {
+ HoodieWriteConfig cfg = baseBuilder()
+ .withPopulateMetaFields(false)
+ .withMetaFieldsCommitTimeEnabled(false)
+ .build();
+ assertEquals(false, cfg.populateMetaFields());
+ assertEquals(false, cfg.isCommitTimeOnlyMetaFieldsMode());
Review Comment:
🤖 nit: the three `assertEquals(false, ...)` calls here are inconsistent with
`assertFalse(...)` used everywhere else in this class — could you swap them for
`assertFalse` to match?
<sub><i>⚠️ AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]