prashantwason commented on code in PR #18669:
URL: https://github.com/apache/hudi/pull/18669#discussion_r3198847170
##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DefaultSource.scala:
##########
@@ -134,7 +134,19 @@ class DefaultSource extends RelationProvider
parameters
}
- val relation = DefaultSource.createRelation(sqlContext, metaClient,
schema, options.toMap)
+ // Spark's DataSource.resolveRelation() invokes this 3-arg overload
directly via the
+ // SchemaRelationProvider path when a user-supplied schema is present (e.g.
+ // spark.read.schema(...).load(path)). The 2-arg overload catches
+ // HoodieSchemaNotFoundException and returns an EmptyRelation, but that
catch is bypassed
+ // on this path, so we mirror the same handling here. Preserve the
caller-supplied schema
+ // so subsequent query analysis (e.g. column resolution in WHERE clauses)
sees the
+ // HMS-known columns even though the on-disk table is schemaless.
+ val relation = try {
+ DefaultSource.createRelation(sqlContext, metaClient, schema,
options.toMap)
+ } catch {
+ case _: HoodieSchemaNotFoundException =>
+ new EmptyRelation(sqlContext, Option(schema).getOrElse(new
StructType()))
Review Comment:
Done in bd4e4c24a2f7 - passed schema through directly. Agreed the Option
wrapper was overly defensive given the SchemaRelationProvider contract.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]