hudi-agent commented on code in PR #18893:
URL: https://github.com/apache/hudi/pull/18893#discussion_r3417150132
##########
hudi-utilities/src/main/java/org/apache/hudi/utilities/schema/FilebasedSchemaProvider.java:
##########
@@ -72,26 +71,39 @@ public FilebasedSchemaProvider(TypedProperties props,
JavaSparkContext jssc) {
}
}
- private Schema parseSchema(String schemaFile) {
+ private HoodieSchema parseSchema(String schemaFile) {
return readSchemaFromFile(schemaFile, this.fs, config);
}
@Override
+ public HoodieSchema getSourceHoodieSchema() {
+ return sourceSchema;
+ }
+
+ @Override
+ @Deprecated
Review Comment:
🤖 I think this can stack-overflow when `targetSchema` is null (i.e. when
`TARGET_SCHEMA_FILE` isn't configured). `super.getTargetHoodieSchema()` in
`SchemaProvider` calls `getTargetSchema()` via virtual dispatch, which lands
back here on `FilebasedSchemaProvider.getTargetSchema()` →
`getTargetHoodieSchema()` → `super...` → infinite recursion. Could you instead
return `sourceSchema` (or `getSourceHoodieSchema()`) as the fallback? @yihua
<sub><i>- AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
##########
hudi-utilities/src/main/java/org/apache/hudi/utilities/schema/HiveSchemaProvider.java:
##########
@@ -85,19 +83,25 @@ public HiveSchemaProvider(TypedProperties props,
JavaSparkContext jssc) {
}
}
+ @Override
+ public HoodieSchema getSourceHoodieSchema() {
+ return sourceHoodieSchema;
Review Comment:
🤖 Same recursion concern as in `FilebasedSchemaProvider`: when
`targetHoodieSchema` is null (no `TARGET_SCHEMA_TABLE` configured),
`super.getTargetHoodieSchema()` will call back into the subclass
`getTargetSchema()` → `getTargetHoodieSchema()` →
`super.getTargetHoodieSchema()` and stack-overflow. Could the fallback return
`sourceHoodieSchema` directly instead?
<sub><i>- AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
##########
hudi-utilities/src/main/java/org/apache/hudi/utilities/schema/ProtoClassBasedSchemaProvider.java:
##########
@@ -79,21 +79,20 @@ public ProtoClassBasedSchemaProvider(TypedProperties props,
JavaSparkContext jss
}
@Override
- public Schema getSourceSchema() {
+ public HoodieSchema getSourceHoodieSchema() {
Review Comment:
🤖 By dropping the `getSourceSchema()`/`getTargetSchema()` overrides and only
overriding the `*HoodieSchema` variants, any existing caller of the deprecated
`provider.getSourceSchema()` / `provider.getTargetSchema()` will now hit the
base class's `throw new UnsupportedOperationException(...)`. Other providers in
this PR (Filebased/Hive/Simple) added explicit `@Deprecated` bridge methods —
should this one (and `JdbcbasedSchemaProvider`, `RowBasedSchemaProvider`,
`SchemaRegistryProvider`) do the same for consistency?
<sub><i>- AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]