Dandandan edited a comment on issue #790:
URL: 
https://github.com/apache/arrow-datafusion/issues/790#issuecomment-888516528


   Yes, this (great) explanation / overview matches my suggestion. Thanks for 
the write up!! Somehow the ASCII art doesn't render well here?
   
   A couple of points:
   * Great idea to use a `Vec<ScalarValue>` first. It seems it should be quite 
easy to swap that to a typed /mutable array alternative later, and probably 
involves less changes now
   * To convert the group by values it can re-use the `create_hashes` function 
from the hash join. This way it will also benefit from improvements there and 
is calculated on (multiple) Arrays directly.
   
   * Before implementing this proposal it is I think be possible with minimal 
changes to also to include a null part in the hashmap key, so the key would 
become something like `(bool, Vec<u8>)` as hashmap key. A null key would have 
`(False, vec![])` as value, an actual value would contain a True and the bytes 
of the contents.
   That will be a bit slower and consume more memory, but at least allows to 
group on nulls too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to