paleolimbot commented on code in PR #8801:
URL: https://github.com/apache/arrow-rs/pull/8801#discussion_r2505129949
##########
parquet-geospatial/src/crs.rs:
##########
@@ -0,0 +1,97 @@
+use std::{collections::HashMap, sync::Arc};
+
+use arrow_schema::{Schema, SchemaBuilder};
+use serde_json::{Value, json};
+
+#[derive(Debug)]
+pub enum Crs {
+ Projjson(serde_json::Value),
+ Srid(u64),
+ Other(String),
+}
+
+impl Crs {
+ // TODO: make fallible
+ fn try_from_parquet_str(crs: &str, metadata: &HashMap<String, String>) ->
Self {
+ let de: Value = serde_json::from_str(crs).unwrap();
+
+ // A CRS that does not exist or is empty defaults to 4326
+ // TODO: http link to parquet geospatial doc
+ let Some(crs) = de["crs"].as_str() else {
+ return Crs::Srid(4326);
+ };
+
+ if let Some(key) = crs.strip_prefix("projjson:") {
+ let Some(proj_meta) = metadata.get(key) else {
+ panic!("Failed to find key in meta: {:?}", metadata)
+ };
+
+ Self::Projjson(serde_json::from_str(proj_meta).unwrap())
+ } else if let Some(srid) = crs.strip_prefix("srid:") {
+ Self::Srid(srid.parse().unwrap())
+ } else {
Review Comment:
> Just to make sure I'm understanding this case correctly. The incoming
metadata would look like:
`{"crs": "<PROJJSON string>"}`
From Parquet, the actual `crs` payload is most commonly `Some("<PROJJSON
string>")`. This is because there is no way to reasonably implement dumping the
string into the metadata and writing `Some("projjson:<key>")` in Arrow C++, and
because most CRSes arrive from GeoArrow as PROJJSON.
From GeoArrow, it's possible to get `{"crs": "<PROJJSON string>"}` (at least
one version of GeoPandas), `{"crs": "AUTH:CODE"}`, or `{"crs": {<JSON object of
PROJJSON>}}` (most common).
> I would be a little bit concerned about the potential for false positives
or potentially malformed PROJ objects
For the specific case of converting Parquet CRSes to GeoArrow ones, I don't
think this matters. It's this implementation's job to pass on as much CRS
information to GeoArrow as accurately as it can (and vice versa). If you try to
parse JSON and it fails, you can (and I would argue you should) just pass on
the string in either direction.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]