dannycjones commented on code in PR #2469:
URL: https://github.com/apache/iceberg-rust/pull/2469#discussion_r3383204056
##########
crates/iceberg/src/spec/table_properties.rs:
##########
@@ -123,7 +125,43 @@ pub struct TableProperties {
pub gc_enabled: bool,
}
+impl Default for TableProperties {
+ fn default() -> Self {
+ Self::try_from(&HashMap::new()).expect("default properties should
always be valid")
+ }
+}
+
impl TableProperties {
+ /// Get a property value by key.
+ pub fn get(&self, key: &str) -> Option<&String> {
+ self.raw.get(key)
+ }
+
Review Comment:
For this and other functions, we should return `&str` rather than `&String`.
##########
crates/iceberg/src/spec/table_properties.rs:
##########
@@ -123,7 +125,43 @@ pub struct TableProperties {
pub gc_enabled: bool,
}
+impl Default for TableProperties {
+ fn default() -> Self {
+ Self::try_from(&HashMap::new()).expect("default properties should
always be valid")
+ }
+}
+
impl TableProperties {
+ /// Get a property value by key.
+ pub fn get(&self, key: &str) -> Option<&String> {
+ self.raw.get(key)
+ }
+
+ /// Check if a property key exists.
+ pub fn contains_key(&self, key: &str) -> bool {
+ self.raw.contains_key(key)
+ }
+
+ /// Iterate over all raw properties.
+ pub fn iter(&self) -> impl Iterator<Item = (&String, &String)> {
+ self.raw.iter()
+ }
+
+ /// Return the number of raw properties.
+ pub fn len(&self) -> usize {
+ self.raw.len()
+ }
+
+ /// Return whether the raw properties map is empty.
+ pub fn is_empty(&self) -> bool {
+ self.raw.is_empty()
+ }
Review Comment:
RustDoc is leaking internals.
This feedback should be applied across all of `impl TableProperties`.
```suggestion
/// Return if any table properties are set.
pub fn is_empty(&self) -> bool {
self.raw.is_empty()
}
```
##########
crates/iceberg/src/spec/table_properties.rs:
##########
@@ -123,7 +125,43 @@ pub struct TableProperties {
pub gc_enabled: bool,
}
+impl Default for TableProperties {
+ fn default() -> Self {
+ Self::try_from(&HashMap::new()).expect("default properties should
always be valid")
+ }
+}
+
impl TableProperties {
+ /// Get a property value by key.
+ pub fn get(&self, key: &str) -> Option<&String> {
+ self.raw.get(key)
+ }
+
+ /// Check if a property key exists.
+ pub fn contains_key(&self, key: &str) -> bool {
+ self.raw.contains_key(key)
+ }
+
+ /// Iterate over all raw properties.
+ pub fn iter(&self) -> impl Iterator<Item = (&String, &String)> {
+ self.raw.iter()
+ }
+
+ /// Return the number of raw properties.
+ pub fn len(&self) -> usize {
+ self.raw.len()
+ }
Review Comment:
Similar to a comment below
(https://github.com/apache/iceberg-rust/pull/2469/changes#r3383210705), I think
we should focus less on raw properties. I do however think the clarification is
OK though.
```suggestion
/// Return the number of table properties.
///
/// This includes any table property, including those that may not be
parsed by this struct.
pub fn len(&self) -> usize {
self.raw.len()
}
```
##########
crates/iceberg/src/spec/table_metadata.rs:
##########
@@ -1407,7 +1402,7 @@ pub(super) mod _serde {
properties: if v.properties.is_empty() {
None
} else {
- Some(v.properties)
+ Some(v.properties.as_raw().clone())
Review Comment:
Looks like we probably want an `TableProperties::into_inner(self) ->
HashMap<String, String>`.
Before this change, no clone was required.
##########
crates/iceberg/src/spec/table_metadata.rs:
##########
@@ -91,10 +91,10 @@ pub struct TableMetadata {
pub(crate) default_partition_type: StructType,
/// An integer; the highest assigned partition field ID across all
partition specs for the table.
pub(crate) last_partition_id: i32,
- ///A string to string map of table properties. This is used to control
settings that
- /// affect reading and writing and is not intended to be used for
arbitrary metadata.
- /// For example, commit.retry.num-retries is used to control the number of
commit retries.
- pub(crate) properties: HashMap<String, String>,
+ /// Table properties combining typed, parsed fields (commit retries, file
format,
+ /// compression, etc.) with the raw key-value map for arbitrary/unknown
properties
+ /// and spec-compatible JSON serialization.
+ pub(crate) properties: TableProperties,
Review Comment:
We're removing guidance on avoiding the use of this for arbitrary metadata.
I don't have a strong opinion here, just acknowledging it. Is this actual
guidance we want to keep? (Arbitrary table properties have been causing me pain
in the past.)
##########
crates/iceberg/src/spec/table_properties.rs:
##########
@@ -123,7 +125,43 @@ pub struct TableProperties {
pub gc_enabled: bool,
}
+impl Default for TableProperties {
+ fn default() -> Self {
+ Self::try_from(&HashMap::new()).expect("default properties should
always be valid")
+ }
+}
+
impl TableProperties {
+ /// Get a property value by key.
+ pub fn get(&self, key: &str) -> Option<&String> {
+ self.raw.get(key)
+ }
+
+ /// Check if a property key exists.
+ pub fn contains_key(&self, key: &str) -> bool {
+ self.raw.contains_key(key)
+ }
+
+ /// Iterate over all raw properties.
+ pub fn iter(&self) -> impl Iterator<Item = (&String, &String)> {
+ self.raw.iter()
+ }
+
+ /// Return the number of raw properties.
+ pub fn len(&self) -> usize {
+ self.raw.len()
+ }
+
+ /// Return whether the raw properties map is empty.
+ pub fn is_empty(&self) -> bool {
+ self.raw.is_empty()
+ }
+
+ /// Return a reference to the raw properties map.
+ pub fn as_raw(&self) -> &HashMap<String, String> {
+ &self.raw
+ }
Review Comment:
My gut is to avoid leaking the dependency on `HashMap` after we've
constructed `TableProperties`, and instead expose it via the streaming method
and `get`, `contains_key`.
I see we do need it for serialization, maybe we keep it as `pub(crate)` (or
replace with `pub(crate) fn into_inner(self) -> HashMap<String, String>`).
##########
crates/iceberg/src/catalog/memory/catalog.rs:
##########
@@ -541,7 +541,7 @@ pub(crate) mod tests {
vec![&expected_sorted_order]
);
- assert_eq!(metadata.properties(), &HashMap::new());
+ assert_eq!(metadata.properties().as_raw(), &HashMap::new());
Review Comment:
```suggestion
assert!(metadata.properties().is_empty());
```
the same for a bunch of these catalog tests
##########
crates/catalog/glue/src/utils.rs:
##########
@@ -340,7 +340,7 @@ mod tests {
&table_name,
metadata_location,
&metadata,
- metadata.properties(),
+ metadata.properties().as_raw(),
None,
)?;
Review Comment:
Please update this function to use `&TableProperties`.
##########
crates/catalog/hms/src/catalog.rs:
##########
@@ -530,7 +530,7 @@ impl Catalog for HmsCatalog {
table_name.clone(),
location,
metadata_location_str.clone(),
- metadata.properties(),
+ metadata.properties().as_raw(),
)?;
Review Comment:
Please update this function to use `&TableProperties`.
##########
crates/iceberg/src/spec/table_metadata.rs:
##########
@@ -358,31 +358,14 @@ impl TableMetadata {
self.default_sort_order_id
}
- /// Returns properties of table.
+ /// Returns table properties, providing both typed access to known
properties
+ /// (e.g., `commit_num_retries`, `metadata_compression_codec`) and raw
key-value
+ /// access for arbitrary/unknown properties via [`TableProperties::get`].
#[inline]
- pub fn properties(&self) -> &HashMap<String, String> {
+ pub fn properties(&self) -> &TableProperties {
&self.properties
}
Review Comment:
The Rustdoc goes into details of the type its returning. I think just
stating that it returns the table properties is enough.
```suggestion
/// Returns table properties.
#[inline]
pub fn properties(&self) -> &TableProperties {
&self.properties
}
```
##########
crates/iceberg/src/spec/table_metadata_builder.rs:
##########
@@ -308,9 +310,11 @@ impl TableMetadataBuilder {
));
}
+ let mut raw = self.metadata.properties.as_raw().clone();
for property in &properties {
- self.metadata.properties.remove(property);
+ raw.remove(property);
}
+ self.metadata.properties = TableProperties::try_from(&raw)?;
Review Comment:
not a suggestion but starting a conversation: should `TableProperties` be
exposing mutation APIs instead of being treated as immutable?
##########
crates/catalog/rest/src/catalog.rs:
##########
Review Comment:
Rewrite the tests here not to use hashmap but to use `TableProperties::get`
to compare.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]