zhjwpku commented on code in PR #662:
URL: https://github.com/apache/iceberg-cpp/pull/662#discussion_r3342607218


##########
src/iceberg/avro/avro_writer.cc:
##########
@@ -81,6 +89,126 @@ Result<std::optional<int32_t>> ParseCodecLevel(const 
WriterProperties& propertie
   return level;
 }
 
+enum class FieldContext {
+  kTopLevel,
+  kStruct,
+  kListElement,
+  kMapKey,
+  kMapValue,
+};
+
+Result<std::optional<SchemaField>> PruneUnknownField(const SchemaField& field,
+                                                     FieldContext context) {
+  if (field.type()->type_id() == TypeId::kUnknown) {
+    ICEBERG_PRECHECK(context != FieldContext::kMapKey,
+                     "Cannot write map key '{}' of unknown type because it has 
no "
+                     "physical Avro representation",
+                     field.name());
+    ICEBERG_PRECHECK(field.optional(), "Unknown type field '{}' must be 
optional",
+                     field.name());
+    if (context == FieldContext::kListElement || context == 
FieldContext::kMapValue) {
+      return field;
+    }

Review Comment:
   Sorry, I may have been misleading in my earlier comment. I think your 
original logic should be correct.
   
   For Parquet, rejecting `list<unknown>` and `map<..., unknown>` makes sense 
because there is no physical value column to write. Pruning the unknown element 
or value would lose list cardinality or map keys.
   
   Avro is different. This PR already maps `UnknownType` to `AVRO_NULL`, so 
`list<unknown>` can be represented as an Avro array with null items, and 
`map<string, unknown>` as an Avro map with null values. That preserves list 
lengths and map keys.
   
   This also matches the Java impl: `TypeToSchema` maps `UNKNOWN` to 
`NULL_SCHEMA`, and the Spark Avro test suite includes coverage for both 
`testUnknownListType` and `testUnknownMapType`.
   
   cc @wgtmac to double-check my understanding.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to