Samza partition hashing relative to other clients

Malcolm McFarland Wed, 14 Dec 2022 20:01:44 -0800

Hey folks,

I'm working on a system where several different Kafka clients (including
Samza) are producing into the same Kafka topic. It's necessary for each of
these clients to calculate the same partition hash for the same key input
to ensure consistent message ordering (there are some asynchronous actions
that need to be ordered across systems). I've been able to get our non-JVM
Kafka clients to calculate partition identifiers (using the murmur2 hashing
algorithm) in the same manner as the official Java Kafka producers.
However, it looks like Samza uses its own hashing algorithm[0]; this is
fine for maintaining order if it's just Samza producing into a topic, but
it's not so great if Samza is just one system of many that are working on a
multi-stage task.


I've dug through the Samza and Kafka codebases quite a bit over the last
few days, and I'm at a loss about how to get Samza to hash partition
indexes in a way that's compatible with other producers. I've tried
implementing Samza's hashing algorithm in other clients (ie with [1]), but
cannot for the life of me get equivalent partition calculations in a
non-JVM language.

Does anybody know a) if it's possible to define a custom key-to-partition
hashing algorithm in Samza, or b) if there is a reliable general-purpose
algorithm that can create the same results as Samza's algorithm?

Cheers,
Malcolm McFarland
Cavulus

[0]
https://github.com/apache/samza/blob/1.7.0/samza-kafka/src/main/java/org/apache/samza/util/KafkaUtil.java#L47-L49
[1]
https://stackoverflow.com/questions/40303333/how-to-replicate-java-hashcode-in-c-language

Samza partition hashing relative to other clients

Reply via email to