etseidl commented on code in PR #8072:
URL: https://github.com/apache/arrow-rs/pull/8072#discussion_r2264018278
##########
parquet/src/basic.rs:
##########
@@ -238,15 +362,149 @@ pub enum LogicalType {
/// A 16-bit floating point number.
Float16,
/// A Variant value.
- Variant,
+ Variant {
+ /// The version of the variant specification that the variant was
written with.
+ specification_version: Option<i8>,
+ },
/// A geospatial feature in the Well-Known Binary (WKB) format with
linear/planar edges interpolation.
- Geometry,
+ Geometry {
+ /// A custom CRS. If unset the defaults to `OGC:CRS84`.
+ crs: Option<String>,
+ },
/// A geospatial feature in the WKB format with an explicit
(non-linear/non-planar) edges interpolation.
- Geography,
+ Geography {
+ /// A custom CRS. If unset the defaults to `OGC:CRS84`.
+ crs: Option<String>,
+ /// An optional algorithm can be set to correctly interpret edges
interpolation
+ /// of the geometries. If unset, the algorithm defaults to
`SPHERICAL``.
+ algorithm: Option<EdgeInterpolationAlgorithm>,
+ },
+ /// For forward compatibility; used when an unknown union value is
encountered.
+ _Unknown {
+ /// The field id encountered when parsing the unknown logical type.
+ field_id: i16,
+ },
+}
+
+impl<'a> TryFrom<&mut ThriftCompactInputProtocol<'a>> for LogicalType {
+ type Error = ParquetError;
+ fn try_from(prot: &mut ThriftCompactInputProtocol<'a>) -> Result<Self> {
Review Comment:
Yes, for now it's patterned on how the thrift compiler works. I know it's
not the most maintainable way of doing things. I'd like to hide as much
complexity in the macros as possible, but where the crate structures differ too
much from the format structures, there's really no way around some hand written
parsing code. I'm open to suggestions as to how to make this more maintainable.
One early thought I had was to try to reproduce the parser cudf uses
(example
[here](https://github.com/rapidsai/cudf/blob/b7d8e9094da3379044a426e103eb5df47fa4b7f2/cpp/src/io/parquet/compact_protocol_reader.cpp#L578)).
They use a tuple of functors, one for each field in a struct/union and then
run recursively through them for each field_id that's pulled out of the thrift
stream. Once you get the hang of it, adding new structs is pretty easy. I'd
really like to reproduce that here, but I don't know if that way of processing
is possible in rust.
In the mean time, once I've got processing of the `ParquetMetaData` as fast
as I can, I want to swing back and macro-ize as much as possible. I like the
model @jhorstmann came up with of just pasting snippets of thrift IDL to
generate code. That should be much more maintainable.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]