hudi-agent commented on code in PR #18893:
URL: https://github.com/apache/hudi/pull/18893#discussion_r3417150132


##########
hudi-utilities/src/main/java/org/apache/hudi/utilities/schema/FilebasedSchemaProvider.java:
##########
@@ -72,26 +71,39 @@ public FilebasedSchemaProvider(TypedProperties props, 
JavaSparkContext jssc) {
     }
   }
 
-  private Schema parseSchema(String schemaFile) {
+  private HoodieSchema parseSchema(String schemaFile) {
     return readSchemaFromFile(schemaFile, this.fs, config);
   }
 
   @Override
+  public HoodieSchema getSourceHoodieSchema() {
+    return sourceSchema;
+  }
+
+  @Override
+  @Deprecated

Review Comment:
   🤖 I think this can stack-overflow when `targetSchema` is null (i.e. when 
`TARGET_SCHEMA_FILE` isn't configured). `super.getTargetHoodieSchema()` in 
`SchemaProvider` calls `getTargetSchema()` via virtual dispatch, which lands 
back here on `FilebasedSchemaProvider.getTargetSchema()` → 
`getTargetHoodieSchema()` → `super...` → infinite recursion. Could you instead 
return `sourceSchema` (or `getSourceHoodieSchema()`) as the fallback? @yihua
   
   <sub><i>- AI-generated; verify before applying. React 👍/👎 to flag 
quality.</i></sub>



##########
hudi-utilities/src/main/java/org/apache/hudi/utilities/schema/HiveSchemaProvider.java:
##########
@@ -85,19 +83,25 @@ public HiveSchemaProvider(TypedProperties props, 
JavaSparkContext jssc) {
     }
   }
 
+  @Override
+  public HoodieSchema getSourceHoodieSchema() {
+    return sourceHoodieSchema;

Review Comment:
   🤖 Same recursion concern as in `FilebasedSchemaProvider`: when 
`targetHoodieSchema` is null (no `TARGET_SCHEMA_TABLE` configured), 
`super.getTargetHoodieSchema()` will call back into the subclass 
`getTargetSchema()` → `getTargetHoodieSchema()` → 
`super.getTargetHoodieSchema()` and stack-overflow. Could the fallback return 
`sourceHoodieSchema` directly instead?
   
   <sub><i>- AI-generated; verify before applying. React 👍/👎 to flag 
quality.</i></sub>



##########
hudi-utilities/src/main/java/org/apache/hudi/utilities/schema/ProtoClassBasedSchemaProvider.java:
##########
@@ -79,21 +79,20 @@ public ProtoClassBasedSchemaProvider(TypedProperties props, 
JavaSparkContext jss
   }
 
   @Override
-  public Schema getSourceSchema() {
+  public HoodieSchema getSourceHoodieSchema() {

Review Comment:
   🤖 By dropping the `getSourceSchema()`/`getTargetSchema()` overrides and only 
overriding the `*HoodieSchema` variants, any existing caller of the deprecated 
`provider.getSourceSchema()` / `provider.getTargetSchema()` will now hit the 
base class's `throw new UnsupportedOperationException(...)`. Other providers in 
this PR (Filebased/Hive/Simple) added explicit `@Deprecated` bridge methods — 
should this one (and `JdbcbasedSchemaProvider`, `RowBasedSchemaProvider`, 
`SchemaRegistryProvider`) do the same for consistency?
   
   <sub><i>- AI-generated; verify before applying. React 👍/👎 to flag 
quality.</i></sub>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to