[GitHub] [arrow-rs] sunchao commented on issue #4729: Remove dictionary type from Arrow logical type

via GitHub Wed, 23 Aug 2023 17:12:12 -0700


sunchao commented on issue #4729:
URL: https://github.com/apache/arrow-rs/issues/4729#issuecomment-1690800836


   > Broadly speaking I think all the kernels where it makes sense to 
accommodate dictionaries, now support dictionaries in some form?
   
   We ran into some issues with dictionary arrays of primitive value types when 
migrating to use DF's new aggregation implementation. Unpacking the 
dictionaries early helped us to bypass the error. We are still in the process 
of migrating so we'll see 😂 
   
   > However, this requires explicit handling of the "dictionary" case. The 
proposed new model, so much as I understand it would not achieve this, and 
would be no better than DF coercing both inputs non-dictionary types?
   
   I think this part can be hidden from the kernels. Perhaps we can implement a 
`map` method on `PrimitiveArray` to handle the dictionary and non-dictionary 
case?
   
   Slightly out of topic :) I'm hoping that we can also define some function 
API that allows people to implement kernels and UDFs easier. Something like:
   
   ```rust
   /// A function that only takes a single argument
   pub trait Function<T: TypeTrait, R: TypeTrait> {
       fn call(&self, arg: &T::Native, result: &mut R::Native) -> bool;
       fn call_batch(&self, arg: &PlainVector, result: &mut MutableVector) {
           // default implementation here
       }
   ```
   
   Unifying the dictionary array may help a bit towards that direction.
   
   
   
   > Oh it is getting triggered, it is just generating very sub-optimal code 😄 
There's a veritable wall of memory shuffle operators, I honestly have a hard 
time following what LLVM is doing...
   
   Yea I have no idea on why LLVM does that. But it generates the same code 
even without the `is_dictionary` flag, so I doubt the flag is the thing to 
blame :)
   
   
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-rs] sunchao commented on issue #4729: Remove dictionary type from Arrow logical type

Reply via email to