rahil-c opened a new pull request, #19048: URL: https://github.com/apache/hudi/pull/19048
Ports apache/hudi#18147 onto current master and addresses the remaining review feedback. Closes #18151. ### What & why When a Spark SQL query against a Hudi table contained unresolved references (typos in column names, missing source tables, etc.), Hudi's analysis layer was masking Spark's own error reporting — discarding the structured `UNRESOLVED_COLUMN.WITH_SUGGESTION` / `TABLE_OR_VIEW_NOT_FOUND` errors (including "did you mean" hints) that Spark already produces. Two guarded one-line fall-throughs now let Spark's `CheckAnalysis` surface its native error, per @yihua's suggestion in https://github.com/apache/hudi/pull/18147#discussion_r2796386493: - `HoodieAnalysis.ProducesHudiMetaFields.unapply` — gate the output check on `resolved.resolved`; if the plan is still unresolved, return `None` and let the pattern-match fall through. Spark's `CheckAnalysis` then runs in its normal phase. (Inspecting `resolved.output` on an unresolved plan could otherwise throw `UnresolvedException` and lose context.) - `HoodieSparkBaseAnalysis.ResolveReferences` (MERGE INTO) — if the source table is still unresolved after the analyzer pass, return the partially-resolved `MergeIntoTable` instead of descending into `resolveAssignments`. Spark's downstream rules already know how to report the precise error. ### On top of the original commit The original commit (credit: @nada-attia / @nsivabalan) is preserved as-is. Added one commit addressing @yihua's test-assertion note (https://github.com/apache/hudi/pull/18147#discussion_r2795763747): the assertions used loose substrings like `"cannot be resolved"` that match Spark's output even without this change. On Spark 3.4+ they now require the structured error-class token (`UNRESOLVED_COLUMN` / `TABLE_OR_VIEW_NOT_FOUND`) that only Spark's `CheckAnalysis` emits and the old Hudi rewrite never carried, with a Spark 3.3 legacy-phrasing fallback (branching on `HoodieSparkUtils.gteqSpark3_4`, matching the existing `HoodieSparkSqlTestBase` pattern). The specific column/table name is still required. ### Risk **Low** — both source changes are guarded fall-throughs in the resolution path; existing query-resolution behavior is unchanged for plans that fully resolve. The meta-field projection simply defers to a later fixed-point analyzer iteration. ### Verification Build verified locally: `mvn install -pl hudi-spark-datasource/hudi-spark -am -DskipTests -Dspark3.5 -Dscala-2.12` (0 errors, including scalastyle). Tests (`TestHoodieAnalysisErrorHandling`) to be run by reviewer / CI. Opened as draft. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
