hudi-agent commented on code in PR #18715:
URL: https://github.com/apache/hudi/pull/18715#discussion_r3237115899
##########
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieFileFormat.java:
##########
@@ -74,13 +74,35 @@ public String getFileExtension() {
return extension;
}
+ public boolean requiresSparkRecordType() {
+ return this == LANCE;
+ }
+
+ public HoodieRecord.HoodieRecordType
resolveRecordType(HoodieRecord.HoodieRecordType fallback) {
+ return requiresSparkRecordType() ? HoodieRecord.HoodieRecordType.SPARK :
fallback;
Review Comment:
🤖 Could this introduce a regression for `.orc` (and `.hfile`) base files?
Previously `filterKeysFromFile` always used the AVRO reader factory (which
supports Parquet/ORC/HFile via `HoodieAvroFileReaderFactory`). After this
change, when `config.getRecordMerger().getRecordType()` returns SPARK,
`resolveRecordTypeForExtension(".orc", SPARK)` returns SPARK — and
`HoodieSparkFileReaderFactory.newOrcFileReader` throws `HoodieIOException("Not
support read orc file")` (same for HFile). It might be safer to special-case
Lance only and keep returning AVRO for the others, or override the fallback
only when `requiresSparkRecordType()` is true (which it already does — the
concern is whether `getRecordMerger().getRecordType()` being SPARK + ORC base
file was previously working via the AVRO path). @yihua could you confirm
whether ORC bloom-index filtering with a Spark merger was previously functional?
<sub><i>- AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]