[GitHub] [arrow-datafusion] Dandandan edited a comment on issue #790: Rework GroupByHash to support grouping by nulls

GitBox Wed, 28 Jul 2021 11:13:02 -0700


Dandandan edited a comment on issue #790:
URL: 
https://github.com/apache/arrow-datafusion/issues/790#issuecomment-888516528



   Yes, this (great) explanation / overview matches my suggestion. Thanks for 
the write up!! Somehow the ASCII art doesn't render well here?
   
   A couple of points:
   * Great idea to use a `Vec<ScalarValue>` first. It seems it should be quite 
easy to swap that to a typed alternative later, and probably involves less 
changes now
   * To convert the group by values it can re-use the `create_hashes` function 
from the hash join. This way it will also benefit from improvements there and 
is calculated on (multiple) Arrays directly.
   
   * Before implementing this proposal it is I think be possible with minimal 
changes to also to include a null part in the hashmap key, so the key would 
become something like `(bool, Vec<u8>)` as hashmap key. A null key would have 
`(False, vec![])` as value, an actual value would contain a True and the bytes 
of the contents.
   That will be a bit slower and consume more memory, but at least allows to 
group on nulls too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] Dandandan edited a comment on issue #790: Rework GroupByHash to support grouping by nulls

Reply via email to