paleolimbot commented on issue #19387: URL: https://github.com/apache/datafusion/issues/19387#issuecomment-3672656847
Thank you for writing this up! I mentioned the multi-part field references because I happen to know they exist but I don't mean to derail your fantastic work on physical expressions and Parquet if it turns out it's a distraction to what you're trying to do here. As a data point, Substrait field references mix struct column and array item references in the same structure: https://substrait.io/expressions/field_references/ . As another, Arrow C++ doesn't (the `FieldRef` is something like a `Vec<FieldPath>`). In the absence of Variant existing or backwards compatibility concerns, I would possibly just suggest updating `Column` to something like: ```rust struct Column { name: String, index: usize, more_indexes: Vec<usize> // Just spitballing here } ``` ...and `projection: Vec<usize>` to `projection: Projection` (where `Projection` could be a `Vec<usize>` or maybe `Vec<Vec<usize>>`). In the presence of Variant existing, walking a `PhysicalExpr` for a `get_field` ScalarUDF call is possibly much easier. Whatever registers variant support could possibly replace `get_field()` with a ScalarUDF that can evaluate against a variant array, too. (No opinions here, just spitballing). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
