sunchao commented on issue #4729:
URL: https://github.com/apache/arrow-rs/issues/4729#issuecomment-1690800836
> Broadly speaking I think all the kernels where it makes sense to
accommodate dictionaries, now support dictionaries in some form?
We ran into some issues with dictionary arrays of primitive value types when
migrating to use DF's new aggregation implementation. Unpacking the
dictionaries early helped us to bypass the error. We are still in the process
of migrating so we'll see 😂
> However, this requires explicit handling of the "dictionary" case. The
proposed new model, so much as I understand it would not achieve this, and
would be no better than DF coercing both inputs non-dictionary types?
I think this part can be hidden from the kernels. Perhaps we can implement a
`map` method on `PrimitiveArray` to handle the dictionary and non-dictionary
case?
Slightly out of topic :) I'm hoping that we can also define some function
API that allows people to implement kernels and UDFs easier. Something like:
```rust
/// A function that only takes a single argument
pub trait Function<T: TypeTrait, R: TypeTrait> {
fn call(&self, arg: &T::Native, result: &mut R::Native) -> bool;
fn call_batch(&self, arg: &PlainVector, result: &mut MutableVector) {
// default implementation here
}
```
Unifying the dictionary array may help a bit towards that direction.
> Oh it is getting triggered, it is just generating very sub-optimal code 😄
There's a veritable wall of memory shuffle operators, I honestly have a hard
time following what LLVM is doing...
Yea I have no idea on why LLVM does that. But it generates the same code
even without the `is_dictionary` flag, so I doubt the flag is the thing to
blame :)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]