prashantwason commented on code in PR #18669:
URL: https://github.com/apache/hudi/pull/18669#discussion_r3203388257


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DefaultSource.scala:
##########
@@ -134,7 +134,19 @@ class DefaultSource extends RelationProvider
       parameters
     }
 
-    val relation = DefaultSource.createRelation(sqlContext, metaClient, 
schema, options.toMap)
+    // Spark's DataSource.resolveRelation() invokes this 3-arg overload 
directly via the
+    // SchemaRelationProvider path when a user-supplied schema is present (e.g.
+    // spark.read.schema(...).load(path)). The 2-arg overload catches
+    // HoodieSchemaNotFoundException and returns an EmptyRelation, but that 
catch is bypassed
+    // on this path, so we mirror the same handling here. Preserve the 
caller-supplied schema
+    // so subsequent query analysis (e.g. column resolution in WHERE clauses) 
sees the
+    // HMS-known columns even though the on-disk table is schemaless.
+    val relation = try {
+      DefaultSource.createRelation(sqlContext, metaClient, schema, 
options.toMap)
+    } catch {
+      case _: HoodieSchemaNotFoundException =>
+        new EmptyRelation(sqlContext, Option(schema).getOrElse(new 
StructType()))

Review Comment:
   Update: had to revert the simplification in 5e25a570dd0f. Turns out the 
2-arg createRelation overload (line 78) re-enters this 3-arg method with 
schema=null, so the SchemaRelationProvider non-null contract assumption doesn't 
hold for internal callers. The defensive Option(schema).getOrElse(new 
StructType()) was load-bearing - removing it broke 
TestCOWDataSource.testReadOfAnEmptyTable on spark3.3 / spark3.5 with NPE in 
BaseRelation.schema().isEmpty. Comment now documents the internal-recursion 
reason.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to