sundy-li commented on issue #1047: URL: https://github.com/apache/arrow-rs/issues/1047#issuecomment-1573015858
> For what it is worth I think this is what DuckDB does (at least this is how I interpret this slide from the [22 - DuckDB Internals (CMU Advanced Databases / Spring 2023)](https://www.youtube.com/watch?v=bZOvAKGkzpQ) lecture If the array is already in Flat format, it needs to construct an extra `Selection vector` to iterate the array, maybe it's a little overhead than iterating the array itself. We used an enum to represent the vector, it's hard to add support for another dictionary vector. ``` #[derive(Debug, Clone, PartialEq, EnumAsInner)] pub enum ValueRef<'a, T: ValueType> { Scalar(T::ScalarRef<'a>), Column(T::Column), } ``` The main drawback is code flood, so we use the code generator to help write vectorized methods, [the generated file]( https://github.com/datafuselabs/databend/blob/aa97d2c5b57f1585abc8192e5fb76f9fc4157903/src/query/expression/src/register.rs#L1092-L1105) is ~6000 LOC . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
