huan233usc opened a new pull request, #731:
URL: https://github.com/apache/iceberg-cpp/pull/731

   Closes #730 (item 2 of #637).
   
   Implements Iceberg v3 column default values: `initial-default` / 
`write-default` on the
   schema, JSON serde, read-path application, schema-evolution support, and 
format-version
   validation.
   
   ## What changed
   
   ### Schema model
   - `SchemaField` carries optional `initial_default` / `write_default` literals
     (`std::shared_ptr<Literal>` to keep `schema_field.h` free of the
     `literal.h → type.h → schema_field.h` include cycle), with 
`WithInitialDefault` /
     `WithWriteDefault` copy-modifiers in the style of 
`AsRequired`/`AsOptional`.
   - `SchemaField::Validate()` checks that defaults are primitive literals 
matching the
     field type; `Schema::Validate(format_version)` rejects schemas with 
defaults below v3
     (uses the previously-unused 
`TableMetadata::kMinFormatVersionDefaultValues`, resolving
     the TODO there).
   
   ### JSON serde
   - `FieldFromJson` / `ToJson(SchemaField)` parse and write `initial-default` /
     `write-default` using the existing single-value serialization
     (`LiteralFromJson(json, type)`), resolving the `add default values` TODO 
in struct
     serialization. All primitive types supported (incl. decimal, fixed, uuid, 
temporal).
   
   ### Read path (`initial-default`)
   - `Project()` maps a column missing from a data file to
     `FieldProjection::Kind::kDefault` carrying the literal when the field has 
an
     `initial-default` — for required *and* optional fields, per the spec 
("optional with
     default" reads the default, not null). Resolves the default-value TODO in
     `schema_util.cc`; the Avro-side projection in `avro_schema_util.cc` gets 
the same
     branch.
   - New `iceberg::arrow` helpers (`literal_util`) convert a `Literal` to an 
Arrow scalar /
     constant array; the Parquet reader materializes `kDefault` via 
`MakeDefaultArray` and
     the Avro reader via `AppendDefaultToBuilder`.
   
   ### Schema evolution (`write-default`)
   - `AddColumn` / `AddRequiredColumn` accept an optional `default_value`, used 
as both the
     `initial-default` and `write-default` of the new column (Java parity). A 
required
     column with a default no longer needs `AllowIncompatibleChanges()`.
   - `RequireColumn()` accepts a column added with a default in the same update 
(resolves
     the defaulted-add TODO in `UpdateColumnRequirementInternal`).
   - New `UpdateColumnDefault()` updates the `write-default` of an existing 
column
     (`initial-default` stays fixed once the column exists).
   - `UpdateColumnDoc` / `RenameColumn` / `UpdateColumn` preserve defaults when
     reconstructing the field; type promotion casts the defaults to the new 
type.
   
   ### Scope note: write-path application
   Writers in this library consume complete Arrow arrays, so filling omitted 
columns with
   `write-default` at write time remains the engine's responsibility, as in 
Java. The
   library's role — storing, validating, serializing the defaults, and exposing 
them
   through schema evolution — is covered here.
   
   ## Testing
   
   - Schema serde round-trips (top-level + nested struct fields, mismatch 
rejection).
   - `Schema::Validate`: v2 rejects defaults, v3 accepts; mismatched default 
type rejected.
   - Projection: missing required/optional fields with `initial-default` → 
`kDefault`;
     present fields ignore `initial-default`.
   - Parquet `ProjectRecordBatch` and Avro `AppendDatumToBuilder`: missing 
columns
     materialize the default at top level and in nested structs.
   - `UpdateSchema`: add with default (both defaults set), 
required-with-default without
     `AllowIncompatibleChanges()`, mismatched default rejected, 
`UpdateColumnDefault`,
     `RequireColumn` on defaulted add, doc-update preservation, type-promotion 
casting,
     and v2 rejection at `Apply()`; new `TableMetadataV3Valid.json` test 
resource.
   - Full suite passing locally (the pre-existing S3 `file_io_test` is 
unrelated and fails
     only in my local environment due to a Homebrew AWS-SDK ABI issue; not 
touched by this
     change).
   
   This pull request and its description were written by Isaac.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to