wgtmac commented on code in PR #662:
URL: https://github.com/apache/iceberg-cpp/pull/662#discussion_r3356654337
##########
src/iceberg/test/update_schema_test.cc:
##########
@@ -1054,6 +1057,113 @@ TEST_F(UpdateSchemaTest, UpdateColumnFloatToDouble) {
EXPECT_EQ(*field_opt->get().type(), *float64());
}
+TEST_F(UpdateSchemaTest, UpdateColumnUnknownToPrimitive) {
+ ICEBERG_UNWRAP_OR_FAIL(auto update, table_->NewUpdateSchema());
+ update->AddColumn("mystery", unknown(), "A null-only placeholder");
Review Comment:
Doesn't it require format version v3 to succeed?
##########
src/iceberg/avro/avro_writer.cc:
##########
@@ -81,6 +89,126 @@ Result<std::optional<int32_t>> ParseCodecLevel(const
WriterProperties& propertie
return level;
}
+enum class FieldContext {
+ kTopLevel,
+ kStruct,
+ kListElement,
+ kMapKey,
+ kMapValue,
+};
+
+Result<std::optional<SchemaField>> PruneUnknownField(const SchemaField& field,
Review Comment:
I don't think it is worth this significant change but only top-level unknown
fields are pruned. We should just simplify this by always writing AVRO_NULL for
unknown types. Same for parquet writer.
##########
src/iceberg/update/update_schema.h:
##########
@@ -180,23 +180,23 @@ class ICEBERG_EXPORT UpdateSchema : public PendingUpdate {
/// this change conflicts with other additions, renames, or updates.
UpdateSchema& RenameColumn(std::string_view name, std::string_view new_name);
- /// \brief Update a column in the schema to a new primitive type.
+ /// \brief Update a column in the schema to a new type.
///
/// The name is used to find the column to update using
Schema::FindFieldByName().
///
- /// Only updates that widen types are allowed.
+ /// Only updates that widen types are allowed. Unknown type placeholders may
be
+ /// updated to any type.
///
/// Columns may be updated and renamed in the same schema update.
///
/// \param name Name of the column to update.
- /// \param new_type Replacement type for the column (must be primitive).
+ /// \param new_type Replacement type for the column.
/// \return Reference to this for method chaining.
/// \note InvalidArgument will be reported if name doesn't identify a column
in the
/// schema or if
/// this change introduces a type incompatibility or if it conflicts
with
/// other additions, renames, or updates.
- UpdateSchema& UpdateColumn(std::string_view name,
- std::shared_ptr<PrimitiveType> new_type);
+ UpdateSchema& UpdateColumn(std::string_view name, std::shared_ptr<Type>
new_type);
Review Comment:
I don't think this is the right direction. As per the spec, unknown type can
only be promoted to primitive type. The Java implementation follows this rule
so it is consistent.
##########
src/iceberg/json_serde.cc:
##########
@@ -441,12 +443,22 @@ Result<std::unique_ptr<Type>> StructTypeFromJson(const
nlohmann::json& json) {
return std::make_unique<StructType>(std::move(fields));
}
+Status ValidateUnknownFieldOptional(const Type& type, bool optional,
+ std::string_view field_name) {
+ if (type.type_id() == TypeId::kUnknown && !optional) {
+ return JsonParseError("Unknown type field '{}' must be optional",
field_name);
+ }
+ return {};
+}
+
Result<std::unique_ptr<Type>> ListTypeFromJson(const nlohmann::json& json) {
ICEBERG_ASSIGN_OR_RAISE(auto element_type, TypeFromJson(json[kElement]));
ICEBERG_ASSIGN_OR_RAISE(auto element_id, GetJsonValue<int32_t>(json,
kElementId));
ICEBERG_ASSIGN_OR_RAISE(auto element_required,
GetJsonValue<bool>(json, kElementRequired));
+ ICEBERG_RETURN_UNEXPECTED(ValidateUnknownFieldOptional(*element_type,
!element_required,
Review Comment:
This check is duplicate since Schema::Validate will cover the same thing.
##########
src/iceberg/avro/avro_schema_util.cc:
##########
@@ -650,6 +661,35 @@ Result<FieldProjection> ProjectNested(const Type&
expected_type,
const ::avro::NodePtr& avro_node,
bool prune_source);
+Result<FieldProjection> ProjectField(const SchemaField& expected_field,
Review Comment:
Nice refactoring!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]