hudi-agent commented on code in PR #18877:
URL: https://github.com/apache/hudi/pull/18877#discussion_r3428939730
##########
hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/io/storage/row/parquet/ParquetRowDataWriter.java:
##########
@@ -64,16 +69,19 @@ public class ParquetRowDataWriter {
public ParquetRowDataWriter(
RecordConsumer recordConsumer,
- RowType rowType,
- GroupType schema,
- boolean utcTimestamp) {
+ boolean utcTimestamp,
+ HoodieSchema hoodieSchema) {
this.recordConsumer = recordConsumer;
this.utcTimestamp = utcTimestamp;
+ RowType rowType = HoodieSchemaConverter.convertToRowType(hoodieSchema);
this.filedWriters = new FieldWriter[rowType.getFieldCount()];
this.fieldNames = rowType.getFieldNames().toArray(new String[0]);
for (int i = 0; i < rowType.getFieldCount(); i++) {
- this.filedWriters[i] = createWriter(rowType.getTypeAt(i));
+ String fieldName = fieldNames[i];
+ HoodieSchema fieldSchema =
hoodieSchema.getNonNullType().getField(fieldName).map(HoodieSchemaField::schema)
Review Comment:
🤖 nit: `hoodieSchema.getNonNullType()` is called on every loop iteration —
could you extract it to a local variable (e.g. `HoodieSchema nonNullSchema =
hoodieSchema.getNonNullType();`) above the `for` loop? It would also align with
how `nonNullSchema` is introduced inside `createWriter`.
<sub><i>- AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
##########
hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/io/storage/row/parquet/ParquetSchemaConverter.java:
##########
@@ -205,12 +209,35 @@ public static Pair<Type, Type>
parquetMapKeyValueType(GroupType mapType) {
return Pair.of(keyValue.getType(MAP_KEY_NAME),
keyValue.getType(MAP_VALUE_NAME));
}
+ /**
+ * Converts a Flink row type to a Parquet message type.
+ *
+ * <p>This overload preserves the pre-VECTOR public API, but converting from
+ * {@link RowType} to {@link HoodieSchema} loses VECTOR logical metadata
such as dimension and
+ * element type. Prefer {@link #convertToParquetMessageType(String,
HoodieSchema)} whenever the
+ * caller already has a metadata-aware {@link HoodieSchema}.
+ */
public static MessageType convertToParquetMessageType(String name, RowType
rowType) {
+ HoodieSchema hoodieSchema = HoodieSchemaConverter.convertToSchema(rowType);
+ return convertToParquetMessageType(name, hoodieSchema);
+ }
Review Comment:
🤖 nit: the `RowType` overload's Javadoc tells callers to prefer this
`HoodieSchema` overload, but the preferred overload itself has no Javadoc. It
might be worth moving (or duplicating) the main description here so a developer
who lands directly on this method understands its behaviour and the
relationship without having to find the other overload first.
<sub><i>- AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
##########
hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/util/RowDataToAvroConverters.java:
##########
@@ -331,6 +331,10 @@ private static RowDataToAvroConverter
createArrayConverter(ArrayType arrayType,
@Override
public Object convert(HoodieSchema schema, Object object) {
+ if (schema.getType() == HoodieSchemaType.VECTOR) {
+ HoodieSchema.Vector vectorSchema = (HoodieSchema.Vector) schema;
+ return new GenericData.Fixed(schema.toAvroSchema(),
VectorConversionUtils.encodeVectorArrayData((ArrayData) object, vectorSchema));
Review Comment:
🤖 nit: `schema.toAvroSchema()` is used here right after casting `schema` to
`vectorSchema` — could you use `vectorSchema.toAvroSchema()` instead to keep
the two references consistent within the block?
<sub><i>- AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]