nsivabalan commented on PR #18147: URL: https://github.com/apache/hudi/pull/18147#issuecomment-4765630557
Hi @nada-attia, picking this up to address @yihua's review. The original approach — catch `UnresolvedException` in `ProducesHudiMetaFields.unapply` and rewrite it as a generic Hudi error — does close out the cryptic message, but as @yihua [pointed out](https://github.com/apache/hudi/pull/18147#discussion_r2795763747) it intercepts Spark before `CheckAnalysis` runs, so we lose Spark's structured `UNRESOLVED_COLUMN.WITH_SUGGESTION` errors with "did you mean" hints. Reworked to follow @yihua's suggestion in [#discussion_r2796386493](https://github.com/apache/hudi/pull/18147#discussion_r2796386493): let Spark's analyzer surface its own error. Force-pushed (squashed the 6 commits) — diff is now small (3 files, ~190 lines, mostly the test). ### Code changes **`HoodieAnalysis.ProducesHudiMetaFields.unapply`** — gate the output inspection on `resolved.resolved`. If the plan is still not fully resolved, return `None` so the pattern-match falls through; Spark's `CheckAnalysis` then runs in its normal phase and produces the precise error. ```diff - if (resolved.output.exists(attr => isMetaField(attr.name))) { + // If the plan is still not fully resolved (e.g., references non-existent + // tables or columns), fall through. Spark's CheckAnalysis runs later and + // produces precise UNRESOLVED_COLUMN / TABLE_OR_VIEW_NOT_FOUND errors with + // "did you mean" suggestions; intercepting UnresolvedException here would + // discard that context. Only inspect the output once the plan is resolved. + if (resolved.resolved && resolved.output.exists(attr => isMetaField(attr.name))) { ``` **`HoodieSparkBaseAnalysis.ResolveReferences` (MIT case)** — if `sourceTable` is still unresolved after `analyzer.execute`, return the partially-resolved `MergeIntoTable` plan instead of descending into `resolveAssignments`. Spark's downstream rules then handle the error correctly. ```diff val sourceTable = if (sourceTableO.resolved) sourceTableO else analyzer.execute(sourceTableO) val m = mO.asInstanceOf[MergeIntoTable].copy(targetTable = targetTable, sourceTable = sourceTable) - EliminateSubqueryAliases(targetTable) match { + if (!sourceTable.resolved) { + m // let Spark's CheckAnalysis surface the unresolved-source error + } else EliminateSubqueryAliases(targetTable) match { ``` ### Reverted from the prior version - `failUnresolvedQuery` on the `HoodieCatalystPlansUtils` trait and the four version-specific impls (3.3 / 3.4 / 3.5 / 4.0) - `collectUnresolvedReferences` helper in `HoodieAnalysis` - The `try/catch UnresolvedException` block in `ProducesHudiMetaFields.unapply` - The unused `UnresolvedException` import ### Tests Rewrote `TestHoodieAnalysisErrorHandling`: - 3 scenarios: MERGE INTO with unresolved column in source query, MERGE INTO with unresolved column in ON predicate, INSERT INTO from non-existent source table. - Assertions now check that Spark's native error patterns (`UNRESOLVED_COLUMN` / `TABLE_OR_VIEW_NOT_FOUND` / "cannot be resolved") are present **and** that the specific column/table name appears in the message. - Each test also asserts that the prior Hudi wrapper (`"Failed to resolve query"` / `"Please check for: (1) typos"`) is NOT present, so a future regression that re-introduces the rewrap fails the test. - Helpers accept both Spark 3.3 (pre-error-class) and Spark 3.4+ (error-class) phrasings, addressing the version-difference concern in [#discussion_r2795763752](https://github.com/apache/hudi/pull/18147#discussion_r2795763752). ### Rebase Rebased onto latest master (was 424 commits behind). @yihua @danny0405 PTAL. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
