Dandandan opened a new pull request, #21345:
URL: https://github.com/apache/datafusion/pull/21345

   ## Which issue does this PR close?
   
   N/A - performance optimization
   
   ## Rationale for this change
   
   When probing `ArrowBytesViewMap` for strings > 12 bytes, the existing code 
only compared the 4-byte prefix before falling through to a full byte 
comparison. It did not compare the string length first, so two strings with 
different lengths but the same hash and prefix would unnecessarily perform a 
full memcmp.
   
   ## What changes are included in this PR?
   
   Compare the first 8 bytes of the StringView (length + 4-byte prefix) as a 
single `u64` instead of extracting and comparing only the 4-byte prefix. This:
   - Rejects length mismatches earlier (before the full byte comparison)
   - Is simpler (one `u64` comparison vs one `u32` extraction + comparison)
   - Is no slower (one 8-byte cmp vs one 4-byte cmp on modern CPUs)
   
   ## Are these changes tested?
   
   Existing tests cover this code path.
   
   ## Are there any user-facing changes?
   
   No, this is a performance optimization only.
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to