zhjwpku commented on code in PR #662:
URL: https://github.com/apache/iceberg-cpp/pull/662#discussion_r3342607218
##########
src/iceberg/avro/avro_writer.cc:
##########
@@ -81,6 +89,126 @@ Result<std::optional<int32_t>> ParseCodecLevel(const
WriterProperties& propertie
return level;
}
+enum class FieldContext {
+ kTopLevel,
+ kStruct,
+ kListElement,
+ kMapKey,
+ kMapValue,
+};
+
+Result<std::optional<SchemaField>> PruneUnknownField(const SchemaField& field,
+ FieldContext context) {
+ if (field.type()->type_id() == TypeId::kUnknown) {
+ ICEBERG_PRECHECK(context != FieldContext::kMapKey,
+ "Cannot write map key '{}' of unknown type because it has
no "
+ "physical Avro representation",
+ field.name());
+ ICEBERG_PRECHECK(field.optional(), "Unknown type field '{}' must be
optional",
+ field.name());
+ if (context == FieldContext::kListElement || context ==
FieldContext::kMapValue) {
+ return field;
+ }
Review Comment:
Sorry, I may have been misleading in my earlier comment. I think your
original logic should be correct.
For Parquet, rejecting `list<unknown>` and `map<..., unknown>` makes sense
because there is no physical value column to write. Pruning the unknown element
or value would lose list cardinality or map keys.
Avro is different. This PR already maps `UnknownType` to `AVRO_NULL`, so
`list<unknown>` can be represented as an Avro array with null items, and
`map<string, unknown>` as an Avro map with null values. That preserves list
lengths and map keys.
This also matches the Java impl: `TypeToSchema` maps `UNKNOWN` to
`NULL_SCHEMA`, and the Spark Avro test suite includes coverage for both
`testUnknownListType` and `testUnknownMapType`.
cc @wgtmac to double-check my understanding.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]