the-other-tim-brown commented on code in PR #14344:
URL: https://github.com/apache/hudi/pull/14344#discussion_r2598923115
##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/catalog/TableOptionProperties.java:
##########
@@ -202,19 +200,17 @@ public static Map<String, String>
getTableOptions(Map<String, String> options) {
public static Map<String, String> translateFlinkTableProperties2Spark(
CatalogTable catalogTable,
- Configuration hadoopConf,
Map<String, String> properties,
List<String> partitionKeys,
boolean withOperationField) {
RowType rowType =
supplementMetaFields(DataTypeUtils.toRowType(catalogTable.getUnresolvedSchema()),
withOperationField);
- Schema schema = AvroSchemaConverter.convertToSchema(rowType);
- MessageType messageType =
ParquetTableSchemaResolver.convertAvroSchemaToParquet(schema, hadoopConf);
Review Comment:
That's right, we are skipping the extra step that translates the schema to
the serialization structure.
For the list representation for example, we don't need to know how many
levels are used to represent a list in the parquet file. The spark schema does
not include this information.
`assumeRepeatedIsListElement` and `readInt96AsFixed` are used when
translating from parquet to avro so it is not currently on this path that is
translating Avro to Parquet to Spark struct.
`uuid` will always translate in the spark schema so the `writeParquetUUID`
is not relevant here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]