hudi-agent commented on code in PR #18585:
URL: https://github.com/apache/hudi/pull/18585#discussion_r3286117689
##########
hudi-common/src/main/java/org/apache/hudi/common/engine/HoodieLocalEngineContext.java:
##########
@@ -63,6 +69,72 @@
*/
public final class HoodieLocalEngineContext extends HoodieEngineContext {
+ private static final Logger LOG =
LoggerFactory.getLogger(HoodieLocalEngineContext.class);
+
+ // When running in Java 11+, the common ForkJoinPool's workers don't inherit
the application
+ // classloader, causing ClassNotFoundExceptions. We use a custom pool whose
thread factory
+ // explicitly sets the correct classloader.
+ // See:
https://stackoverflow.com/questions/66240365/java-11-upgrade-from-8-parallel-streams-throws-classnotfoundexception
+ // Lazy holder — initialized on first use, not at class-load time (JLS
12.4.2 guarantees thread safety).
+ private static class PoolHolder {
+ static final ForkJoinPool INSTANCE = initForkJoinPool();
+ }
+
+ private static ForkJoinPool initForkJoinPool() {
Review Comment:
🤖 nit: `initForkJoinPool` currently does two distinct things — parsing the
Java version (the try/catch block) and then deciding which pool to construct.
Could you extract the version parsing into a small `detectJavaVersion()`
helper? It would make each piece easier to read and test in isolation.
<sub><i>- AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java:
##########
@@ -919,6 +919,35 @@ public HoodieRestoreMetadata restoreToInstant(final String
savepointToRestoreTim
Timer.Context timerContext = metrics.getRollbackCtx();
try {
HoodieTable<T, I, K, O> table = initTable(WriteOperationType.UNKNOWN,
Option.empty(), initialMetadataTableIfNecessary);
+ // We cannot restore to a point older than the penultimate compaction on
the MDT,
+ // as that would leave the metadata table in an inconsistent state.
+ // Skip this check when the target is a valid savepoint — both
restoreToSavepoint() and
+ // direct restoreToInstant() calls with a savepoint are safe because the
restore operation
+ // properly rolls back the MDT alongside the data table.
+ boolean isSavepointRestore =
table.getCompletedSavepointTimeline().getInstants()
+ .stream().anyMatch(i ->
i.requestedTime().equals(savepointToRestoreTimestamp));
+ if (!isSavepointRestore &&
table.getMetaClient().getTableConfig().isMetadataTableAvailable()) {
+ try {
+ HoodieTableMetaClient mdtMetaClient = HoodieTableMetaClient.builder()
+ .setConf(storageConf.newInstance())
+
.setBasePath(getMetadataTableBasePath(config.getBasePath())).build();
+ List<HoodieInstant> completedCompactions =
mdtMetaClient.getCommitTimeline().filterCompletedInstants().getInstants();
+ if (!completedCompactions.isEmpty()) {
+ int boundaryIdx = Math.max(0, completedCompactions.size() - 2);
Review Comment:
🤖 For the `completedCompactions.size() == 1` case, `Math.max(0, -1) = 0`
picks the only compaction as the boundary — so you can never restore to ≤ the
single MDT compaction, even though the comment talks about the "penultimate".
Is this asymmetry with the size≥2 case (where restore *at* the latest
compaction is allowed) intentional? Worth a comment if yes, since it's not
obvious from the code.
<sub><i>- AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java:
##########
@@ -919,6 +919,35 @@ public HoodieRestoreMetadata restoreToInstant(final String
savepointToRestoreTim
Timer.Context timerContext = metrics.getRollbackCtx();
try {
HoodieTable<T, I, K, O> table = initTable(WriteOperationType.UNKNOWN,
Option.empty(), initialMetadataTableIfNecessary);
+ // We cannot restore to a point older than the penultimate compaction on
the MDT,
+ // as that would leave the metadata table in an inconsistent state.
+ // Skip this check when the target is a valid savepoint — both
restoreToSavepoint() and
+ // direct restoreToInstant() calls with a savepoint are safe because the
restore operation
+ // properly rolls back the MDT alongside the data table.
+ boolean isSavepointRestore =
table.getCompletedSavepointTimeline().getInstants()
+ .stream().anyMatch(i ->
i.requestedTime().equals(savepointToRestoreTimestamp));
+ if (!isSavepointRestore &&
table.getMetaClient().getTableConfig().isMetadataTableAvailable()) {
+ try {
+ HoodieTableMetaClient mdtMetaClient = HoodieTableMetaClient.builder()
+ .setConf(storageConf.newInstance())
+
.setBasePath(getMetadataTableBasePath(config.getBasePath())).build();
+ List<HoodieInstant> completedCompactions =
mdtMetaClient.getCommitTimeline().filterCompletedInstants().getInstants();
+ if (!completedCompactions.isEmpty()) {
+ int boundaryIdx = Math.max(0, completedCompactions.size() - 2);
+ String boundaryCompactionTimestamp =
completedCompactions.get(boundaryIdx).requestedTime();
+ if (LESSER_THAN_OR_EQUALS.test(savepointToRestoreTimestamp,
boundaryCompactionTimestamp)) {
+ throw new HoodieRestoreException(String.format(
+ "Restoring to a commit %s older than boundary compaction
commit %s would leave the MDT inconsistent."
+ + " Failing the restore request, as metadata table bootstrap
may be required.",
+ savepointToRestoreTimestamp, boundaryCompactionTimestamp));
+ }
+ }
+ } catch (HoodieRestoreException e) {
+ throw e;
+ } catch (Exception e) {
Review Comment:
🤖 This broad `catch (Exception e)` swallows everything — including
programming errors like NPE or ClassCastException — and falls back to the
unguarded restore path, just logging a warning. If the MDT is genuinely broken,
that's exactly the case where we *shouldn't* silently proceed. Could you narrow
this to the expected I/O failures (e.g., `IOException`, `HoodieIOException`) so
unexpected errors still surface?
<sub><i>- AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java:
##########
@@ -919,6 +919,35 @@ public HoodieRestoreMetadata restoreToInstant(final String
savepointToRestoreTim
Timer.Context timerContext = metrics.getRollbackCtx();
try {
HoodieTable<T, I, K, O> table = initTable(WriteOperationType.UNKNOWN,
Option.empty(), initialMetadataTableIfNecessary);
+ // We cannot restore to a point older than the penultimate compaction on
the MDT,
+ // as that would leave the metadata table in an inconsistent state.
+ // Skip this check when the target is a valid savepoint — both
restoreToSavepoint() and
+ // direct restoreToInstant() calls with a savepoint are safe because the
restore operation
+ // properly rolls back the MDT alongside the data table.
+ boolean isSavepointRestore =
table.getCompletedSavepointTimeline().getInstants()
+ .stream().anyMatch(i ->
i.requestedTime().equals(savepointToRestoreTimestamp));
+ if (!isSavepointRestore &&
table.getMetaClient().getTableConfig().isMetadataTableAvailable()) {
+ try {
+ HoodieTableMetaClient mdtMetaClient = HoodieTableMetaClient.builder()
+ .setConf(storageConf.newInstance())
+
.setBasePath(getMetadataTableBasePath(config.getBasePath())).build();
+ List<HoodieInstant> completedCompactions =
mdtMetaClient.getCommitTimeline().filterCompletedInstants().getInstants();
+ if (!completedCompactions.isEmpty()) {
+ int boundaryIdx = Math.max(0, completedCompactions.size() - 2);
+ String boundaryCompactionTimestamp =
completedCompactions.get(boundaryIdx).requestedTime();
+ if (LESSER_THAN_OR_EQUALS.test(savepointToRestoreTimestamp,
boundaryCompactionTimestamp)) {
Review Comment:
🤖 The comment above says "older than the penultimate compaction" (strict
less-than), but `LESSER_THAN_OR_EQUALS` also blocks the case where
`savepointToRestoreTimestamp` exactly equals `boundaryCompactionTimestamp`. The
error message string repeats the "older than" wording. Is the equals-case
intentionally blocked, or should this be `LESSER_THAN`? @nsivabalan could you
weigh in on the intended boundary semantics for MDT consistency?
<sub><i>- AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]