Tushar7012 commented on code in PR #19975:
URL: https://github.com/apache/datafusion/pull/19975#discussion_r2724548091
##########
datafusion/physical-expr-common/src/binary_view_map.rs:
##########
@@ -267,28 +273,49 @@ where
};
observe_payload_fn(payload);
continue;
- };
+ }
- // get the value as bytes
- let value: &[u8] = value.as_ref();
+ // Extract length from the view (first 4 bytes of u128 in
little-endian)
+ let len = (view_u128 & 0xFFFFFFFF) as u32;
let entry = self.map.find_mut(hash, |header| {
if header.hash != hash {
return false;
}
- let v = self.builder.get_value(header.view_idx);
- v == value
+ // Fast path: for inline strings (<=12 bytes), the entire value
+ // is stored in the u128 view, so we can compare directly
+ // This avoids the expensive conversion back to bytes
+ if len <= 12 {
+ return header.view == view_u128;
+ }
+
+ // For larger strings: first compare the 4-byte prefix (bytes
4-7 of u128)
Review Comment:
Thanks for the insight! That makes sense - the hash comparison is doing most
of the heavy lifting. I'll keep the code as-is since it's still slightly better
and doesn't add complexity.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]