Dandandan opened a new pull request, #21393:
URL: https://github.com/apache/datafusion/pull/21393

   ## Which issue does this PR close?
   
   <!--
   We generally require a GitHub issue to be filed for all bug fixes and 
enhancements and this helps us generate change logs for our releases. You can 
link an issue to this PR using the GitHub syntax. For example `Closes #123` to 
close issue #123.
   -->
   
   Related to memory optimization.
   
   ## Rationale for this change
   
   `ArrowBytesViewMap` is used for `GROUP BY` and `COUNT DISTINCT` on 
string/binary view types. Each hash table `Entry` previously stored `{ view: 
u128, hash: u64, payload: V }` (32+ bytes). 
   
   The `view: u128` was redundant with the `views` Vec, and the `payload: V` 
was always either `()` (for sets) or equivalent to the insertion index (for 
group-by). Since entries are inserted sequentially, the view index *is* the 
group index.
   
   ## What changes are included in this PR?
   
   - Remove the `V` generic parameter from `ArrowBytesViewMap` and `Entry`
   - Replace `view: u128` in `Entry` with `view_idx: usize` (index into the 
`views` Vec)
   - `Entry` shrinks from 32+ bytes to 16 bytes (50%+ reduction in per-entry 
hash table memory)
   - Simplify `insert_if_new` API from two callbacks (`make_payload_fn`, 
`observe_payload_fn`) to one (`observe_fn(usize)`)
   - Simplify `GroupValuesBytesView` by removing redundant `num_groups` field
   
   ## Are these changes tested?
   
   Yes, existing tests pass. Added `test_entry_size` to verify Entry is 16 
bytes.
   
   ## Are there any user-facing changes?
   
   No.
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to