rahil-c opened a new pull request, #19048:
URL: https://github.com/apache/hudi/pull/19048

   Ports apache/hudi#18147 onto current master and addresses the remaining 
review feedback.
   
   Closes #18151.
   
   ### What & why
   
   When a Spark SQL query against a Hudi table contained unresolved references 
(typos in column names, missing source tables, etc.), Hudi's analysis layer was 
masking Spark's own error reporting — discarding the structured 
`UNRESOLVED_COLUMN.WITH_SUGGESTION` / `TABLE_OR_VIEW_NOT_FOUND` errors 
(including "did you mean" hints) that Spark already produces.
   
   Two guarded one-line fall-throughs now let Spark's `CheckAnalysis` surface 
its native error, per @yihua's suggestion in 
https://github.com/apache/hudi/pull/18147#discussion_r2796386493:
   
   - `HoodieAnalysis.ProducesHudiMetaFields.unapply` — gate the output check on 
`resolved.resolved`; if the plan is still unresolved, return `None` and let the 
pattern-match fall through. Spark's `CheckAnalysis` then runs in its normal 
phase. (Inspecting `resolved.output` on an unresolved plan could otherwise 
throw `UnresolvedException` and lose context.)
   - `HoodieSparkBaseAnalysis.ResolveReferences` (MERGE INTO) — if the source 
table is still unresolved after the analyzer pass, return the 
partially-resolved `MergeIntoTable` instead of descending into 
`resolveAssignments`. Spark's downstream rules already know how to report the 
precise error.
   
   ### On top of the original commit
   
   The original commit (credit: @nada-attia / @nsivabalan) is preserved as-is. 
Added one commit addressing @yihua's test-assertion note 
(https://github.com/apache/hudi/pull/18147#discussion_r2795763747): the 
assertions used loose substrings like `"cannot be resolved"` that match Spark's 
output even without this change. On Spark 3.4+ they now require the structured 
error-class token (`UNRESOLVED_COLUMN` / `TABLE_OR_VIEW_NOT_FOUND`) that only 
Spark's `CheckAnalysis` emits and the old Hudi rewrite never carried, with a 
Spark 3.3 legacy-phrasing fallback (branching on 
`HoodieSparkUtils.gteqSpark3_4`, matching the existing `HoodieSparkSqlTestBase` 
pattern). The specific column/table name is still required.
   
   ### Risk
   
   **Low** — both source changes are guarded fall-throughs in the resolution 
path; existing query-resolution behavior is unchanged for plans that fully 
resolve. The meta-field projection simply defers to a later fixed-point 
analyzer iteration.
   
   ### Verification
   
   Build verified locally: `mvn install -pl hudi-spark-datasource/hudi-spark 
-am -DskipTests -Dspark3.5 -Dscala-2.12` (0 errors, including scalastyle). 
Tests (`TestHoodieAnalysisErrorHandling`) to be run by reviewer / CI.
   
   Opened as draft.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to