aokolnychyi commented on a change in pull request #1335:
URL: https://github.com/apache/iceberg/pull/1335#discussion_r470923026
##########
File path:
parquet/src/main/java/org/apache/iceberg/parquet/ParquetSchemaUtil.java
##########
@@ -41,8 +43,29 @@ public static MessageType convert(Schema schema, String
name) {
return new TypeToMessageType().convert(schema, name);
}
+ /**
+ * Converts a Parquet schema to an Iceberg schema. Fields without IDs are
kept and assigned fallback IDs.
+ *
+ * @param parquetSchema a Parquet schema
+ * @return a matching Iceberg schema for the provided Parquet schema
+ */
public static Schema convert(MessageType parquetSchema) {
- MessageTypeToType converter = new MessageTypeToType(parquetSchema);
+ // if the Parquet schema does not contain ids, we assign fallback ids to
top-level fields
+ // all remaining fields will get ids >= 1000 to avoid pruning columns
without ids
+ MessageType parquetSchemaWithIds = hasIds(parquetSchema) ? parquetSchema :
addFallbackIds(parquetSchema);
+ AtomicInteger nextId = new AtomicInteger(1000);
+ return convert(parquetSchemaWithIds, name -> nextId.getAndIncrement());
+ }
+
+ /**
+ * Converts a Parquet schema to an Iceberg schema. Fields without IDs are
pruned.
+ *
+ * @param parquetSchema a Parquet schema
+ * @param nameToIdFunc a name to field id mapping function
+ * @return a matching Iceberg schema for the provided Parquet schema
+ */
+ public static Schema convert(MessageType parquetSchema, Function<String[],
Integer> nameToIdFunc) {
Review comment:
Removed it from the public API.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]