blackmwk commented on code in PR #2188:
URL: https://github.com/apache/iceberg-rust/pull/2188#discussion_r3348211190
##########
crates/iceberg/src/spec/datatypes.rs:
##########
@@ -844,6 +882,42 @@ impl MapType {
}
}
+/// Variant type - can hold semi-structured data of any type.
+/// This is an Iceberg V3 feature.
+#[derive(Debug, PartialEq, Eq, Clone, Copy, Hash)]
+pub struct VariantType;
Review Comment:
Why we need this empty struct? I think we could remove this?
##########
crates/iceberg/src/spec/schema/mod.rs:
##########
@@ -421,6 +425,67 @@ impl Schema {
pub fn field_id_to_fields(&self) -> &HashMap<i32, NestedFieldRef> {
&self.id_to_field
}
+
+ /// Returns the minimum [`FormatVersion`] required to represent all types
in this schema.
+ ///
+ /// Defaults to `FormatVersion::V1` if all types are universally supported.
+ pub fn min_format_version(&self) -> FormatVersion {
+ self.as_struct()
+ .fields()
+ .iter()
+ .map(|f| f.field_type.min_format_version())
+ .max()
+ .unwrap_or(FormatVersion::V1)
+ }
+
+ /// Check that all types in this schema are supported by the given format
version.
+ ///
+ /// Mirrors Java's `Schema.checkCompatibility()`; returns an error listing
every
+ /// incompatible field. Two checks per field:
+ ///
+ /// - **Type**: `TimestampNs` / `TimestamptzNs` / `Variant` require v3+.
+ /// - **Initial default**: a non-null `initial_default` requires
+ /// [`MIN_FORMAT_VERSION_DEFAULT_VALUES`] — it backfills pre-existing
rows,
+ /// which older readers can't honor. `write_default` only affects newly
+ /// written rows (physically materialized, read the same at any
version), so
+ /// it is not checked.
+ pub fn check_format_compatibility(&self, format_version: FormatVersion) ->
Result<()> {
+ let mut problems: Vec<String> = Vec::new();
+
+ for field in self.id_to_field.values() {
+ let name = self
+ .name_by_field_id(field.id)
+ .unwrap_or(field.name.as_str());
+
+ let min_version = field.field_type.min_format_version();
Review Comment:
This is error prone. Java's approach uses TypeID, but this method will calls
recursively again and again for nested data types.
##########
crates/iceberg/src/spec/schema/mod.rs:
##########
@@ -421,6 +425,67 @@ impl Schema {
pub fn field_id_to_fields(&self) -> &HashMap<i32, NestedFieldRef> {
&self.id_to_field
}
+
+ /// Returns the minimum [`FormatVersion`] required to represent all types
in this schema.
+ ///
+ /// Defaults to `FormatVersion::V1` if all types are universally supported.
+ pub fn min_format_version(&self) -> FormatVersion {
Review Comment:
This api looks odd to me, it makes me feel like reading a field, which it's
not that cheap. I think a better solution is to add a `SchemaVisitor`
implementation, which check if it's compatible with some FormatVersion.
##########
crates/iceberg/src/spec/schema/prune_columns.rs:
##########
Review Comment:
When you change these implementations, I think we need to add some ut the
verify that these changes are correct.
##########
crates/iceberg/src/spec/schema/mod.rs:
##########
@@ -421,6 +425,67 @@ impl Schema {
pub fn field_id_to_fields(&self) -> &HashMap<i32, NestedFieldRef> {
&self.id_to_field
}
+
+ /// Returns the minimum [`FormatVersion`] required to represent all types
in this schema.
+ ///
+ /// Defaults to `FormatVersion::V1` if all types are universally supported.
+ pub fn min_format_version(&self) -> FormatVersion {
+ self.as_struct()
+ .fields()
+ .iter()
+ .map(|f| f.field_type.min_format_version())
+ .max()
+ .unwrap_or(FormatVersion::V1)
+ }
+
+ /// Check that all types in this schema are supported by the given format
version.
+ ///
+ /// Mirrors Java's `Schema.checkCompatibility()`; returns an error listing
every
+ /// incompatible field. Two checks per field:
+ ///
+ /// - **Type**: `TimestampNs` / `TimestamptzNs` / `Variant` require v3+.
+ /// - **Initial default**: a non-null `initial_default` requires
+ /// [`MIN_FORMAT_VERSION_DEFAULT_VALUES`] — it backfills pre-existing
rows,
+ /// which older readers can't honor. `write_default` only affects newly
+ /// written rows (physically materialized, read the same at any
version), so
+ /// it is not checked.
+ pub fn check_format_compatibility(&self, format_version: FormatVersion) ->
Result<()> {
Review Comment:
See the comment below, we should use a visitor for this.
##########
crates/integration_tests/tests/read_variant.rs:
##########
Review Comment:
I don't understand why you put these tests in integration tests. We should
gradually remove this integration tests, so we should not do this unnecessary
we have no other choice. These tests are almost all about arrow readers, why
not put them in arrow module?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]