hudi-agent commented on code in PR #19051:
URL: https://github.com/apache/hudi/pull/19051#discussion_r3459426352
##########
hudi-spark-datasource/hudi-spark/src/test/java/org/apache/hudi/client/functional/TestHoodieClientOnCopyOnWriteStorage.java:
##########
@@ -2152,6 +2154,61 @@ public void testRollingMetadataPreservedInCleanCommits()
throws Exception {
"Schema should be rolled over into new commit even with lookback=1");
}
+ @Test
+ public void
testExecutingPendingCleanInstantsBeforeSchedulingNewCleanInstant() throws
Exception {
+ Properties props = new Properties();
+ props.setProperty("hoodie.clean.automatic", "false");
+ HoodieWriteConfig cfg = getConfigBuilder().withProperties(props).build();
+ SparkRDDWriteClient client = getHoodieWriteClient(cfg);
+
+ // Bulk insert
+ String firstCommit = WriteClientTestUtils.createNewInstantTime();
+ insertFirstBatch(cfg, client, firstCommit, "000", 100,
SparkRDDWriteClient::bulkInsert,
+ false, false, 100, INSTANT_GENERATOR);
+
+ // First upsert
+ String secondCommit = WriteClientTestUtils.createNewInstantTime();
+ updateBatch(cfg, client, secondCommit, firstCommit, Option.empty(), "000",
100,
+ SparkRDDWriteClient::upsert, false, true, 100, 100, 2,
INSTANT_GENERATOR);
+
+ // Second upsert
+ String thirdCommit = WriteClientTestUtils.createNewInstantTime();
+ updateBatch(cfg, client, thirdCommit, secondCommit, Option.empty(), "000",
100,
+ SparkRDDWriteClient::upsert, false, true, 100, 100, 3,
INSTANT_GENERATOR);
+
+ // Schedule clean operation
+ Properties cleanProps = new Properties();
+ cleanProps.setProperty("hoodie.clean.automatic", "true");
+ cleanProps.setProperty("hoodie.clean.commits.retained", "1");
+ cleanProps.setProperty("hoodie.clean.multiple.enabled", "false");
+ HoodieWriteConfig cleanCfgs =
getConfigBuilder().withProperties(cleanProps).build();
+ SparkRDDWriteClient cleanClient = new SparkRDDWriteClient(context,
cleanCfgs);
+ cleanClient.scheduleTableService(Option.empty(), TableServiceType.CLEAN);
+
+ // Verify whether clean operation is scheduled.
+ Option<HoodieInstant> firsCleanInstant =
metaClient.reloadActiveTimeline().lastInstant();
Review Comment:
🤖 nit: `firsCleanInstant` looks like a typo — could you rename it to
`firstCleanInstant`? It's used a few times below so the typo is pretty visible.
<sub><i>⚠️ AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
##########
hudi-spark-datasource/hudi-spark/src/test/java/org/apache/hudi/client/functional/TestHoodieClientOnCopyOnWriteStorage.java:
##########
@@ -2152,6 +2154,61 @@ public void testRollingMetadataPreservedInCleanCommits()
throws Exception {
"Schema should be rolled over into new commit even with lookback=1");
}
+ @Test
+ public void
testExecutingPendingCleanInstantsBeforeSchedulingNewCleanInstant() throws
Exception {
+ Properties props = new Properties();
+ props.setProperty("hoodie.clean.automatic", "false");
+ HoodieWriteConfig cfg = getConfigBuilder().withProperties(props).build();
+ SparkRDDWriteClient client = getHoodieWriteClient(cfg);
+
+ // Bulk insert
+ String firstCommit = WriteClientTestUtils.createNewInstantTime();
+ insertFirstBatch(cfg, client, firstCommit, "000", 100,
SparkRDDWriteClient::bulkInsert,
+ false, false, 100, INSTANT_GENERATOR);
+
+ // First upsert
+ String secondCommit = WriteClientTestUtils.createNewInstantTime();
+ updateBatch(cfg, client, secondCommit, firstCommit, Option.empty(), "000",
100,
+ SparkRDDWriteClient::upsert, false, true, 100, 100, 2,
INSTANT_GENERATOR);
+
+ // Second upsert
+ String thirdCommit = WriteClientTestUtils.createNewInstantTime();
+ updateBatch(cfg, client, thirdCommit, secondCommit, Option.empty(), "000",
100,
+ SparkRDDWriteClient::upsert, false, true, 100, 100, 3,
INSTANT_GENERATOR);
+
+ // Schedule clean operation
+ Properties cleanProps = new Properties();
+ cleanProps.setProperty("hoodie.clean.automatic", "true");
+ cleanProps.setProperty("hoodie.clean.commits.retained", "1");
+ cleanProps.setProperty("hoodie.clean.multiple.enabled", "false");
+ HoodieWriteConfig cleanCfgs =
getConfigBuilder().withProperties(cleanProps).build();
+ SparkRDDWriteClient cleanClient = new SparkRDDWriteClient(context,
cleanCfgs);
+ cleanClient.scheduleTableService(Option.empty(), TableServiceType.CLEAN);
+
+ // Verify whether clean operation is scheduled.
+ Option<HoodieInstant> firsCleanInstant =
metaClient.reloadActiveTimeline().lastInstant();
+ assertTrue(firsCleanInstant.isPresent());
+ assertEquals(CLEAN_ACTION, firsCleanInstant.get().getAction());
+
+ // Second upsert
Review Comment:
🤖 nit: this comment says `// Second upsert` but the second upsert was
`thirdCommit` a few lines up — this one is actually the third upsert. Could you
update it to `// Third upsert` to keep the sequence labels accurate?
<sub><i>⚠️ AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
##########
hudi-spark-datasource/hudi-spark/src/test/java/org/apache/hudi/client/functional/TestHoodieClientOnCopyOnWriteStorage.java:
##########
@@ -2152,6 +2154,61 @@ public void testRollingMetadataPreservedInCleanCommits()
throws Exception {
"Schema should be rolled over into new commit even with lookback=1");
}
+ @Test
+ public void
testExecutingPendingCleanInstantsBeforeSchedulingNewCleanInstant() throws
Exception {
+ Properties props = new Properties();
+ props.setProperty("hoodie.clean.automatic", "false");
+ HoodieWriteConfig cfg = getConfigBuilder().withProperties(props).build();
+ SparkRDDWriteClient client = getHoodieWriteClient(cfg);
+
+ // Bulk insert
+ String firstCommit = WriteClientTestUtils.createNewInstantTime();
+ insertFirstBatch(cfg, client, firstCommit, "000", 100,
SparkRDDWriteClient::bulkInsert,
+ false, false, 100, INSTANT_GENERATOR);
+
+ // First upsert
+ String secondCommit = WriteClientTestUtils.createNewInstantTime();
+ updateBatch(cfg, client, secondCommit, firstCommit, Option.empty(), "000",
100,
+ SparkRDDWriteClient::upsert, false, true, 100, 100, 2,
INSTANT_GENERATOR);
+
+ // Second upsert
+ String thirdCommit = WriteClientTestUtils.createNewInstantTime();
+ updateBatch(cfg, client, thirdCommit, secondCommit, Option.empty(), "000",
100,
+ SparkRDDWriteClient::upsert, false, true, 100, 100, 3,
INSTANT_GENERATOR);
+
+ // Schedule clean operation
+ Properties cleanProps = new Properties();
+ cleanProps.setProperty("hoodie.clean.automatic", "true");
+ cleanProps.setProperty("hoodie.clean.commits.retained", "1");
+ cleanProps.setProperty("hoodie.clean.multiple.enabled", "false");
+ HoodieWriteConfig cleanCfgs =
getConfigBuilder().withProperties(cleanProps).build();
Review Comment:
🤖 nit: `cleanCfgs` is plural but it holds a single `HoodieWriteConfig` —
could you rename it to `cleanCfg` to match the singular `cfg` used above?
<sub><i>⚠️ AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]