Dandandan opened a new issue #240:
URL: https://github.com/apache/arrow-datafusion/issues/240


   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   To save some memory usage, and potentially also is faster, the data 
in`visited_left_side` in the `HashJoinStream` could be stored in a bitmap 
instead of a `Vec<bool>`. This would save ~7/8 byte per left row.
   If we store _only_ 32 bit integers on the left, the savings would be ~4-5% 
assuming we use 4 bytes for the items and roughly 16 bytes per left side row 
for the hasmap. Not too big, but a nice win in some cases. This could be bigger 
when we use a more memory-efficient data-structure for the hashmap.
   
   Additionally, in case every row is not matches or no row is unmatched, it 
could include a fast path for those cases.
   
   **Describe the solution you'd like**
   Use a bitmap instead of `Vec<bool>`. The bitmap could be from arrow or maybe 
the `bitvec` crate.
   
   **Describe alternatives you've considered**
   Keep using a `Vec<bool>`
   
   **Additional context**
   n/a


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to