wgtmac commented on code in PR #662:
URL: https://github.com/apache/iceberg-cpp/pull/662#discussion_r3287137948


##########
src/iceberg/test/update_schema_test.cc:
##########
@@ -1054,6 +1054,36 @@ TEST_F(UpdateSchemaTest, UpdateColumnFloatToDouble) {
   EXPECT_EQ(*field_opt->get().type(), *float64());
 }
 
+TEST_F(UpdateSchemaTest, UpdateColumnUnknownToPrimitiveFails) {

Review Comment:
   Spec v3 allows `unknown` to promote to any primitive type. This should 
succeed; `unknown -> struct/list/map` should be rejected instead.



##########
src/iceberg/parquet/parquet_writer.cc:
##########
@@ -71,6 +78,64 @@ Result<std::optional<int32_t>> ParseCodecLevel(const 
WriterProperties& propertie
   return level;
 }
 
+std::optional<std::shared_ptr<Type>> PruneUnknownType(const 
std::shared_ptr<Type>& type);
+
+std::optional<SchemaField> PruneUnknownField(const SchemaField& field) {
+  auto pruned_type = PruneUnknownType(field.type());
+  if (!pruned_type.has_value()) {
+    return std::nullopt;
+  }
+  return SchemaField(field.field_id(), field.name(), 
std::move(pruned_type.value()),
+                     field.optional(), field.doc());
+}
+
+std::optional<std::shared_ptr<Type>> PruneUnknownType(const 
std::shared_ptr<Type>& type) {
+  switch (type->type_id()) {
+    case TypeId::kUnknown:
+      return std::nullopt;
+    case TypeId::kStruct: {
+      const auto& struct_type = static_cast<const StructType&>(*type);
+      std::vector<SchemaField> fields;
+      for (const auto& field : struct_type.fields()) {
+        if (auto pruned_field = PruneUnknownField(field)) {
+          fields.emplace_back(std::move(pruned_field.value()));
+        }
+      }
+      return std::make_shared<StructType>(std::move(fields));
+    }
+    case TypeId::kList: {
+      const auto& list_type = static_cast<const ListType&>(*type);
+      auto pruned_element = PruneUnknownField(list_type.element());
+      if (!pruned_element.has_value()) {

Review Comment:
   Pruning `list<unknown>` or `map<..., unknown>` drops the whole container, 
including list lengths or map keys. Java rejects these on write; only 
top-level/struct unknown fields should be skipped.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to