etseidl commented on code in PR #8072: URL: https://github.com/apache/arrow-rs/pull/8072#discussion_r2264018278
########## parquet/src/basic.rs: ########## @@ -238,15 +362,149 @@ pub enum LogicalType { /// A 16-bit floating point number. Float16, /// A Variant value. - Variant, + Variant { + /// The version of the variant specification that the variant was written with. + specification_version: Option<i8>, + }, /// A geospatial feature in the Well-Known Binary (WKB) format with linear/planar edges interpolation. - Geometry, + Geometry { + /// A custom CRS. If unset the defaults to `OGC:CRS84`. + crs: Option<String>, + }, /// A geospatial feature in the WKB format with an explicit (non-linear/non-planar) edges interpolation. - Geography, + Geography { + /// A custom CRS. If unset the defaults to `OGC:CRS84`. + crs: Option<String>, + /// An optional algorithm can be set to correctly interpret edges interpolation + /// of the geometries. If unset, the algorithm defaults to `SPHERICAL``. + algorithm: Option<EdgeInterpolationAlgorithm>, + }, + /// For forward compatibility; used when an unknown union value is encountered. + _Unknown { + /// The field id encountered when parsing the unknown logical type. + field_id: i16, + }, +} + +impl<'a> TryFrom<&mut ThriftCompactInputProtocol<'a>> for LogicalType { + type Error = ParquetError; + fn try_from(prot: &mut ThriftCompactInputProtocol<'a>) -> Result<Self> { Review Comment: Yes, for now it's patterned on how the thrift compiler works. I know it's not the most maintainable way of doing things. I'd like to hide as much complexity in the macros as possible, but where the crate structures differ too much from the format structures, there's really no way around some hand written parsing code. I'm open to suggestions as to how to make this more maintainable. One early thought I had was to try to reproduce the parser cudf uses (example [here](https://github.com/rapidsai/cudf/blob/b7d8e9094da3379044a426e103eb5df47fa4b7f2/cpp/src/io/parquet/compact_protocol_reader.cpp#L578)). They use a tuple of functors, one for each field in a struct/union and then run recursively through them for each field_id that's pulled out of the thrift stream. Once you get the hang of it, adding new structs is pretty easy. I'd really like to reproduce that here, but I don't know if that way of processing is possible in rust. In the mean time, once I've got processing of the `ParquetMetaData` as fast as I can, I want to swing back and macro-ize as much as possible. I like the model @jhorstmann came up with of just pasting snippets of thrift IDL to generate code. That should be much more maintainable. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org