lokeshj1703 opened a new pull request, #18977:
URL: https://github.com/apache/hudi/pull/18977

   ### Describe the issue this Pull Request addresses
   
   Closes #18668.
   
   `org.apache.hudi.DefaultSource` has two read-side overloads of 
`createRelation`:
   
   - The 2-arg overload `createRelation(sqlContext, parameters)` wraps its body 
in a `try { … } catch { case _: HoodieSchemaNotFoundException => new 
EmptyRelation(…) }`. This catch was added in [HUDI-7147 / 
#10689](https://github.com/apache/hudi/pull/10689) so that schema-less Hudi 
tables (no commits / commit metadata deleted / legacy schema-less layout) do 
not explode at query analysis time.
   - The 3-arg overload `createRelation(sqlContext, optParams, schema)` calls 
`DefaultSource.createRelation(sqlContext, metaClient, schema, options.toMap)` 
directly, **without** the same catch.
   
   Spark's `DataSource.resolveRelation()` chooses the overload based on whether 
a user-supplied schema is present:
   
   ```scala
   case (dataSource: SchemaRelationProvider, Some(schema)) =>
     dataSource.createRelation(sparkSession.sqlContext, caseInsensitiveOptions, 
schema)
   case (dataSource: RelationProvider, _) =>
     dataSource.createRelation(sparkSession.sqlContext, caseInsensitiveOptions)
   ```
   
   So any read path that supplies a schema (e.g. 
`spark.read.schema(s).format("hudi").load(path)`, or HMS-catalog resolution 
that already knows the schema) bypasses the 2-arg catch and surfaces 
`HoodieSchemaNotFoundException` directly.
   
   ### Summary and Changelog
   
   - **`DefaultSource.scala` (3-arg `createRelation`)**: mirror the existing 
2-arg catch so `HoodieSchemaNotFoundException` resolves to `EmptyRelation` on 
this overload too. Adds an inline comment explaining why both overloads need 
the same catch.
   - **`TestCOWDataSource.testReadOfAnEmptyTableWithUserSuppliedSchema`**: 
sibling of the existing `testReadOfAnEmptyTable` that asserts 
`spark.read.schema(userSchema).format("hudi").load(basePath).count() == 0` 
instead of throwing on a schema-less table.
   
   ### Impact
   
   User-facing: a Hudi table whose schema is unresolvable will now return an 
empty relation when queried with a user-supplied schema, matching the existing 
no-schema-supplied behavior. No previously-successful path changes behavior — 
this only converts a previously-thrown exception into an empty result on the 
same exact failure condition.
   
   ### Risk Level
   
   low — minimal scope (one try/catch mirroring existing logic), covered by a 
new unit test that mirrors an existing one.
   
   ### Documentation Update
   
   none
   
   ### Contributor's checklist
   
   - [x] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [x] Enough context is provided in the sections above
   - [x] Adequate tests were added if applicable
   - [x] CI passed
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to