wgtmac commented on code in PR #662:
URL: https://github.com/apache/iceberg-cpp/pull/662#discussion_r3286530627


##########
src/iceberg/type_fwd.h:
##########
@@ -31,7 +31,7 @@ namespace iceberg {
 /// This is not a complete data type by itself because some types are nested
 /// and/or parameterized.
 ///
-/// Iceberg V3 types are not currently supported.
+/// Iceberg V3's `unknown` type is supported as a null-only placeholder type.

Review Comment:
   Please register `TypeId::kUnknown -> 3` in 
`TableMetadata::kMinFormatVersions`. This advertises a v3-only type, but the 
validation table is still empty, so v1/v2 metadata can contain `unknown`. Java 
rejects `UNKNOWN` before v3.



##########
src/iceberg/util/type_util.cc:
##########
@@ -426,6 +426,10 @@ bool IsPromotionAllowed(const std::shared_ptr<Type>& 
from_type,
   TypeId from_id = from_type->type_id();
   TypeId to_id = to_type->type_id();
 
+  if (from_id == TypeId::kUnknown) {

Review Comment:
   Java does not allow `UNKNOWN` as a schema-update promotion source. With 
this, `UpdateColumn` can turn an `unknown` column into any primitive type, 
which diverges from Java's `TypeUtil.isPromotionAllowed` rules.



##########
src/iceberg/json_serde.cc:
##########
@@ -540,6 +558,7 @@ Result<std::unique_ptr<SchemaField>> FieldFromJson(const 
nlohmann::json& json) {
   ICEBERG_ASSIGN_OR_RAISE(auto required, GetJsonValue<bool>(json, kRequired));
   ICEBERG_ASSIGN_OR_RAISE(auto doc, GetJsonValueOrDefault<std::string>(json, 
kDoc));
 
+  ICEBERG_RETURN_UNEXPECTED(ValidateUnknownFieldOptional(*type, !required, 
name));

Review Comment:
   This only rejects required `unknown` during JSON parsing. The public 
construction/update paths still allow it via `SchemaField::MakeRequired(..., 
unknown())` and `AllowIncompatibleChanges().AddRequiredColumn(..., unknown())`. 
The invariant should live in the model/update path too.



##########
src/iceberg/schema_internal.cc:
##########
@@ -141,6 +141,9 @@ ArrowErrorCode ToArrowSchema(const Type& type, bool 
optional, std::string_view n
           ArrowMetadataBuilderAppend(&metadata_buffer, 
ArrowCharView(kArrowExtensionName),
                                      ArrowCharView(kArrowUuidExtensionName)));
     } break;
+    case TypeId::kUnknown:

Review Comment:
   Java Parquet skips unknown fields on write. Mapping `unknown` to Arrow null 
here makes the Parquet writer emit a physical null column, or depend on Arrow 
rejecting it. The write path should skip unknown columns instead.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to