huan233usc commented on code in PR #2653:
URL: https://github.com/apache/iceberg-rust/pull/2653#discussion_r3423334645
##########
crates/iceberg/src/spec/datatypes.rs:
##########
@@ -204,6 +207,157 @@ impl From<MapType> for Type {
}
}
+/// Iceberg geometry type.
+#[derive(Debug, Serialize, Deserialize, PartialEq, Eq, Clone, Hash, Default)]
+pub struct GeometryType {
+ crs: Option<String>,
+}
+
+impl GeometryType {
+ /// Creates a geometry type with an optional coordinate reference system.
+ pub fn new(crs: Option<String>) -> Result<Self> {
+ Ok(Self {
+ crs: normalize_crs(crs)?,
+ })
+ }
+
+ /// Returns the coordinate reference system, or `None` for the Iceberg
default CRS.
+ pub fn crs(&self) -> Option<&str> {
+ self.crs.as_deref()
+ }
+}
+
+/// Iceberg geography type.
+#[derive(Debug, Serialize, Deserialize, PartialEq, Eq, Clone)]
+pub struct GeographyType {
+ crs: Option<String>,
+ algorithm: WkbEdges,
+}
+
+impl Default for GeographyType {
+ fn default() -> Self {
+ Self {
+ crs: None,
+ algorithm: WkbEdges::Spherical,
+ }
+ }
+}
+
+impl Hash for GeographyType {
+ fn hash<H: Hasher>(&self, state: &mut H) {
+ self.crs.hash(state);
+ wkb_edges_as_str(self.algorithm).hash(state);
+ }
+}
+
+impl GeographyType {
+ /// Creates a geography type with an optional coordinate reference system
and edge interpolation algorithm.
+ pub fn new(crs: Option<String>, algorithm: WkbEdges) -> Result<Self> {
Review Comment:
The public `iceberg::spec` API exposes `parquet_geospatial::WkbEdges`
(`GeographyType::new`/`algorithm()`), coupling the type layer to the
parquet-geospatial crate. The set of edge algorithms is defined by the spec
itself
([edge-interpolation-algorithm](https://iceberg.apache.org/spec/#edge-interpolation-algorithm))
— could we define an Iceberg-owned enum mirroring the spec and convert to
`WkbEdges` only at the Parquet boundary, keeping the type layer format-agnostic?
##########
crates/iceberg/src/spec/values/datum.rs:
##########
@@ -417,8 +417,10 @@ impl Datum {
PrimitiveType::Uuid => {
PrimitiveLiteral::UInt128(u128::from_be_bytes(bytes.try_into()?))
}
- PrimitiveType::Fixed(_) =>
PrimitiveLiteral::Binary(Vec::from(bytes)),
- PrimitiveType::Binary =>
PrimitiveLiteral::Binary(Vec::from(bytes)),
+ PrimitiveType::Fixed(_)
+ | PrimitiveType::Binary
+ | PrimitiveType::Geometry(_)
+ | PrimitiveType::Geography(_) =>
PrimitiveLiteral::Binary(Vec::from(bytes)),
Review Comment:
The single-value path decodes geo bytes as opaque WKB. Per the spec, geo
bounds are a single point encoded as `x:y[:z][:m]` little-endian f64s, not WKB
([bound-serialization](https://iceberg.apache.org/spec/#bound-serialization),
[bounds-for-geometry-and-geography](https://iceberg.apache.org/spec/#bounds-for-geometry-and-geography)).
Both implementations skip spatial bounds today so it's latent, but worth
agreeing the bound codec follows the spec point encoding so they don't diverge
later.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]