etseidl commented on code in PR #8072:
URL: https://github.com/apache/arrow-rs/pull/8072#discussion_r2264018278


##########
parquet/src/basic.rs:
##########
@@ -238,15 +362,149 @@ pub enum LogicalType {
     /// A 16-bit floating point number.
     Float16,
     /// A Variant value.
-    Variant,
+    Variant {
+        /// The version of the variant specification that the variant was 
written with.
+        specification_version: Option<i8>,
+    },
     /// A geospatial feature in the Well-Known Binary (WKB) format with 
linear/planar edges interpolation.
-    Geometry,
+    Geometry {
+        /// A custom CRS. If unset the defaults to `OGC:CRS84`.
+        crs: Option<String>,
+    },
     /// A geospatial feature in the WKB format with an explicit 
(non-linear/non-planar) edges interpolation.
-    Geography,
+    Geography {
+        /// A custom CRS. If unset the defaults to `OGC:CRS84`.
+        crs: Option<String>,
+        /// An optional algorithm can be set to correctly interpret edges 
interpolation
+        /// of the geometries. If unset, the algorithm defaults to 
`SPHERICAL``.
+        algorithm: Option<EdgeInterpolationAlgorithm>,
+    },
+    /// For forward compatibility; used when an unknown union value is 
encountered.
+    _Unknown {
+        /// The field id encountered when parsing the unknown logical type.
+        field_id: i16,
+    },
+}
+
+impl<'a> TryFrom<&mut ThriftCompactInputProtocol<'a>> for LogicalType {
+    type Error = ParquetError;
+    fn try_from(prot: &mut ThriftCompactInputProtocol<'a>) -> Result<Self> {

Review Comment:
   Yes, for now it's patterned on how the thrift compiler works. I know it's 
not the most maintainable way of doing things. I'd like to hide as much 
complexity in the macros as possible, but where the crate structures differ too 
much from the format structures, there's really no way around some hand written 
parsing code. I'm open to suggestions as to how to make this more maintainable.
   
   One early thought I had was to try to reproduce the parser cudf uses 
(example 
[here](https://github.com/rapidsai/cudf/blob/b7d8e9094da3379044a426e103eb5df47fa4b7f2/cpp/src/io/parquet/compact_protocol_reader.cpp#L578)).
 They use a tuple of functors, one for each field in a struct/union and then 
run recursively through them for each field_id that's pulled out of the thrift 
stream. Once you get the hang of it, adding new structs is pretty easy. I'd 
really like to reproduce that here, but I don't know if that way of processing 
is possible in rust.
   
   In the mean time, once I've got processing of the `ParquetMetaData` as fast 
as I can, I want to swing back and macro-ize as much as possible. I like the 
model @jhorstmann came up with of just pasting snippets of thrift IDL to 
generate code. That should be much more maintainable.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to