Dandandan edited a comment on issue #790:
URL: 
https://github.com/apache/arrow-datafusion/issues/790#issuecomment-888516528


   Yes, this (great) explanation / overview matches my suggestion. Thanks for 
the write up!! Somehow the ASCII art doesn't render well here?
   
   A couple of points:
   * Great idea to use a `Vec<ScalarValue>` first. It seems it should be quite 
easy to swap that to a typed alternative later, and probably involves less 
changes now
   * To convert the group by values it can re-use the `create_hashes` function 
from the hash join. This way it will also benefit from improvements there and 
is calculated on (multiple) Arrays directly.
   
   * Before implementing this proposal it is I think be possible with minimal 
changes to also to include a null part in the hashmap key, so the key would 
become something like `(bool, Vec<u8>)` as hashmap key. A null key would have 
`(False, vec![])` as value, an actual value would contain a True and the bytes 
of the contents.
   That will be a bit slower and consume more memory, but at least allows to 
group on nulls too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to