Re: [PR] fix: Fix classloader issue during pre-commit validations [hudi]

via GitHub Thu, 21 May 2026 22:47:04 -0700


hudi-agent commented on code in PR #18585:
URL: https://github.com/apache/hudi/pull/18585#discussion_r3286117689



##########
hudi-common/src/main/java/org/apache/hudi/common/engine/HoodieLocalEngineContext.java:
##########
@@ -63,6 +69,72 @@
  */
 public final class HoodieLocalEngineContext extends HoodieEngineContext {
 
+  private static final Logger LOG = 
LoggerFactory.getLogger(HoodieLocalEngineContext.class);
+
+  // When running in Java 11+, the common ForkJoinPool's workers don't inherit 
the application
+  // classloader, causing ClassNotFoundExceptions. We use a custom pool whose 
thread factory
+  // explicitly sets the correct classloader.
+  // See: 
https://stackoverflow.com/questions/66240365/java-11-upgrade-from-8-parallel-streams-throws-classnotfoundexception
+  // Lazy holder — initialized on first use, not at class-load time (JLS 
12.4.2 guarantees thread safety).
+  private static class PoolHolder {
+    static final ForkJoinPool INSTANCE = initForkJoinPool();
+  }
+
+  private static ForkJoinPool initForkJoinPool() {

Review Comment:
   🤖 nit: `initForkJoinPool` currently does two distinct things — parsing the 
Java version (the try/catch block) and then deciding which pool to construct. 
Could you extract the version parsing into a small `detectJavaVersion()` 
helper? It would make each piece easier to read and test in isolation.
   
   <sub><i>- AI-generated; verify before applying. React 👍/👎 to flag 
quality.</i></sub>



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java:
##########
@@ -919,6 +919,35 @@ public HoodieRestoreMetadata restoreToInstant(final String 
savepointToRestoreTim
     Timer.Context timerContext = metrics.getRollbackCtx();
     try {
       HoodieTable<T, I, K, O> table = initTable(WriteOperationType.UNKNOWN, 
Option.empty(), initialMetadataTableIfNecessary);
+      // We cannot restore to a point older than the penultimate compaction on 
the MDT,
+      // as that would leave the metadata table in an inconsistent state.
+      // Skip this check when the target is a valid savepoint — both 
restoreToSavepoint() and
+      // direct restoreToInstant() calls with a savepoint are safe because the 
restore operation
+      // properly rolls back the MDT alongside the data table.
+      boolean isSavepointRestore = 
table.getCompletedSavepointTimeline().getInstants()
+          .stream().anyMatch(i -> 
i.requestedTime().equals(savepointToRestoreTimestamp));
+      if (!isSavepointRestore && 
table.getMetaClient().getTableConfig().isMetadataTableAvailable()) {
+        try {
+          HoodieTableMetaClient mdtMetaClient = HoodieTableMetaClient.builder()
+                  .setConf(storageConf.newInstance())
+                  
.setBasePath(getMetadataTableBasePath(config.getBasePath())).build();
+          List<HoodieInstant> completedCompactions = 
mdtMetaClient.getCommitTimeline().filterCompletedInstants().getInstants();
+          if (!completedCompactions.isEmpty()) {
+            int boundaryIdx = Math.max(0, completedCompactions.size() - 2);

Review Comment:
   🤖 For the `completedCompactions.size() == 1` case, `Math.max(0, -1) = 0` 
picks the only compaction as the boundary — so you can never restore to ≤ the 
single MDT compaction, even though the comment talks about the "penultimate". 
Is this asymmetry with the size≥2 case (where restore *at* the latest 
compaction is allowed) intentional? Worth a comment if yes, since it's not 
obvious from the code.
   
   <sub><i>- AI-generated; verify before applying. React 👍/👎 to flag 
quality.</i></sub>



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java:
##########
@@ -919,6 +919,35 @@ public HoodieRestoreMetadata restoreToInstant(final String 
savepointToRestoreTim
     Timer.Context timerContext = metrics.getRollbackCtx();
     try {
       HoodieTable<T, I, K, O> table = initTable(WriteOperationType.UNKNOWN, 
Option.empty(), initialMetadataTableIfNecessary);
+      // We cannot restore to a point older than the penultimate compaction on 
the MDT,
+      // as that would leave the metadata table in an inconsistent state.
+      // Skip this check when the target is a valid savepoint — both 
restoreToSavepoint() and
+      // direct restoreToInstant() calls with a savepoint are safe because the 
restore operation
+      // properly rolls back the MDT alongside the data table.
+      boolean isSavepointRestore = 
table.getCompletedSavepointTimeline().getInstants()
+          .stream().anyMatch(i -> 
i.requestedTime().equals(savepointToRestoreTimestamp));
+      if (!isSavepointRestore && 
table.getMetaClient().getTableConfig().isMetadataTableAvailable()) {
+        try {
+          HoodieTableMetaClient mdtMetaClient = HoodieTableMetaClient.builder()
+                  .setConf(storageConf.newInstance())
+                  
.setBasePath(getMetadataTableBasePath(config.getBasePath())).build();
+          List<HoodieInstant> completedCompactions = 
mdtMetaClient.getCommitTimeline().filterCompletedInstants().getInstants();
+          if (!completedCompactions.isEmpty()) {
+            int boundaryIdx = Math.max(0, completedCompactions.size() - 2);
+            String boundaryCompactionTimestamp = 
completedCompactions.get(boundaryIdx).requestedTime();
+            if (LESSER_THAN_OR_EQUALS.test(savepointToRestoreTimestamp, 
boundaryCompactionTimestamp)) {
+              throw new HoodieRestoreException(String.format(
+                  "Restoring to a commit %s older than boundary compaction 
commit %s would leave the MDT inconsistent."
+                  + " Failing the restore request, as metadata table bootstrap 
may be required.",
+                  savepointToRestoreTimestamp, boundaryCompactionTimestamp));
+            }
+          }
+        } catch (HoodieRestoreException e) {
+          throw e;
+        } catch (Exception e) {

Review Comment:
   🤖 This broad `catch (Exception e)` swallows everything — including 
programming errors like NPE or ClassCastException — and falls back to the 
unguarded restore path, just logging a warning. If the MDT is genuinely broken, 
that's exactly the case where we *shouldn't* silently proceed. Could you narrow 
this to the expected I/O failures (e.g., `IOException`, `HoodieIOException`) so 
unexpected errors still surface?
   
   <sub><i>- AI-generated; verify before applying. React 👍/👎 to flag 
quality.</i></sub>



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java:
##########
@@ -919,6 +919,35 @@ public HoodieRestoreMetadata restoreToInstant(final String 
savepointToRestoreTim
     Timer.Context timerContext = metrics.getRollbackCtx();
     try {
       HoodieTable<T, I, K, O> table = initTable(WriteOperationType.UNKNOWN, 
Option.empty(), initialMetadataTableIfNecessary);
+      // We cannot restore to a point older than the penultimate compaction on 
the MDT,
+      // as that would leave the metadata table in an inconsistent state.
+      // Skip this check when the target is a valid savepoint — both 
restoreToSavepoint() and
+      // direct restoreToInstant() calls with a savepoint are safe because the 
restore operation
+      // properly rolls back the MDT alongside the data table.
+      boolean isSavepointRestore = 
table.getCompletedSavepointTimeline().getInstants()
+          .stream().anyMatch(i -> 
i.requestedTime().equals(savepointToRestoreTimestamp));
+      if (!isSavepointRestore && 
table.getMetaClient().getTableConfig().isMetadataTableAvailable()) {
+        try {
+          HoodieTableMetaClient mdtMetaClient = HoodieTableMetaClient.builder()
+                  .setConf(storageConf.newInstance())
+                  
.setBasePath(getMetadataTableBasePath(config.getBasePath())).build();
+          List<HoodieInstant> completedCompactions = 
mdtMetaClient.getCommitTimeline().filterCompletedInstants().getInstants();
+          if (!completedCompactions.isEmpty()) {
+            int boundaryIdx = Math.max(0, completedCompactions.size() - 2);
+            String boundaryCompactionTimestamp = 
completedCompactions.get(boundaryIdx).requestedTime();
+            if (LESSER_THAN_OR_EQUALS.test(savepointToRestoreTimestamp, 
boundaryCompactionTimestamp)) {

Review Comment:
   🤖 The comment above says "older than the penultimate compaction" (strict 
less-than), but `LESSER_THAN_OR_EQUALS` also blocks the case where 
`savepointToRestoreTimestamp` exactly equals `boundaryCompactionTimestamp`. The 
error message string repeats the "older than" wording. Is the equals-case 
intentionally blocked, or should this be `LESSER_THAN`? @nsivabalan could you 
weigh in on the intended boundary semantics for MDT consistency?
   
   <sub><i>- AI-generated; verify before applying. React 👍/👎 to flag 
quality.</i></sub>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] fix: Fix classloader issue during pre-commit validations [hudi]

Reply via email to