On 2025-08-22 20:57, Morten Brørup wrote:
+static inline int
+rte_hash_k2_cmp_eq(const void *key1, const void *key2, size_t key_len
__rte_unused)
+{
+ const uint16_t *k1 = key1;
+ const uint16_t *k2 = key2;
+
What we do now is to require the keys are 16-bit aligned (which wasn't
the case before).
You could
uint16_t k1;
memcpy(&k1, key1, sizeof(uint16_t));
instead.
Would generate the same code, but be safe from any future alignment issues.
Or use the unaligned types, e.g.:
const unaligned_uint16_t *k1 = key1;
const unaligned_uint16_t *k2 = key2;
Could you explain why that is safe? Doesn't
__attribute__((__aligned__(1)))
just say specify the object doesn't have any alignment requirements,
without asking the compiler to deal with it?
It is safe because the compiler does deal with it.
Here's a simple example:
https://godbolt.org/z/r39zdeEcx
I see. Thanks!
I don't know how MSVC deals with it, but it doesn't support alignment sensitive
architectures, so no real problem.
Anyway, maybe it's safe to assume the keys are aligned, so this is not
an issue.
Lots of DPDK code ignores alignment issues.
+ return k1[0] ^ k2[0];
+}
Haven't you implemented "neq" rather than "eq" here? If the keys are
equal, the result is 0. Should be != 0.
Not a bug.
Well, the function body doesn't do what the function name tells it. :)
Agree. They really should be renamed to _cmp_neq.
And _cmp_eq wrappers could be kept for backwards API compatibility. Possibly
marked deprecated.
These hash compare functions are in fact "neq", not "eq".
Having "_cmp_eq" in the function names is extremely unfortunate and
misleading.
Would it be worth adding a comment like "use XOR to make this
branch-free"? It may not be obvious to all readers.
That said, I’m not sure this trick will actually change the generated
object code - especially if the result of the eq function is still used
in a conditional afterward. Anyway, keeping it seems like a good
conservative approach.
I wonder if any compiler is clever enough to use a different memcmp
implementation if we inform the compiler at build time that we don't care if
key1 is less than or greater key2, only if they differ or not.
All what is needed is a constant-size length. (Only tested with the most
recent GCC and clang.)
Yes. But that was not what I was thinking about... I was wondering if
"memcmp()!=0" compiles to code that calls some other memcmp implementation that
doesn't check which of the two strings is lower, but only tests if they differ.
I don't think so. On GCC 15.2 at least, for non-constant buffer lengths,
glibc memcmp() will always be invoked.
At least GCC will emit a cmp instruction though (so not branch free), if
that matters.
If so, the OTHER_BYTES handler shouldn't call memcmp() directly, but a
wrapper around it:
rte_hash_k_cmp_eq(const void *key1, const void *key2, size_t key_len
__rte_unused)
{
return memcmp(key1, key2, key_len) != 0;
}
Should work. (Remove __rte_unused.)
I just tested it in Compiler Explorer, and the function calls the same
memcmp(), regardless of the !=0.
So no benefit compared to calling memcmp() directly from the jump table.
The rte_hash_k_cmp_eq() must be invoked from some other function, which
provides a constant key_len, for the magic to happen.
I would keep this function (maybe as __rte_hash_cmp_neq()) for that purpose.