danny0405 commented on code in PR #18958:
URL: https://github.com/apache/hudi/pull/18958#discussion_r3400263740
##########
hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/util/RowDataToAvroConverters.java:
##########
@@ -232,7 +232,10 @@ public Object convert(HoodieSchema schema, Object object) {
converter = createArrayConverter((ArrayType) type, utcTimezone);
break;
case ROW:
- converter = createRowConverter((RowType) type, utcTimezone);
+ RowType rowType = (RowType) type;
+ converter = HoodieSchemaConverter.isBlobStructure(rowType)
Review Comment:
This selects the BLOB converter from the Flink `RowType` alone, but the
converter is later invoked with an arbitrary `HoodieSchema`. That breaks
existing/plain BLOB-shaped records that do not carry the BLOB logical type:
`RowDataQueryContexts.fromSchema(schema)` will turn
`HoodieSchemaTestUtils.createPlainBlobRecord(...)` into the same `ROW<type
STRING, data BYTES, reference ROW<...>>`, this branch will choose
`createBlobConverter`, and then `createBlobConverter` builds `new
GenericData.EnumSymbol(fields.get(0).schema().toAvroSchema(), ...)` even though
`fields.get(0)` is a plain STRING schema. That either throws during conversion
or writes the wrong representation for a non-BLOB record. Can we gate the enum
special-case on the target Hoodie schema instead (for example, keep ROW
conversion generic and emit `EnumSymbol` only when the field schema is `ENUM`,
or make the BLOB converter fall back unless
`schema.getNonNullType().isBlobField()`)?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]