Hey folks, I'm working on a system where several different Kafka clients (including Samza) are producing into the same Kafka topic. It's necessary for each of these clients to calculate the same partition hash for the same key input to ensure consistent message ordering (there are some asynchronous actions that need to be ordered across systems). I've been able to get our non-JVM Kafka clients to calculate partition identifiers (using the murmur2 hashing algorithm) in the same manner as the official Java Kafka producers. However, it looks like Samza uses its own hashing algorithm[0]; this is fine for maintaining order if it's just Samza producing into a topic, but it's not so great if Samza is just one system of many that are working on a multi-stage task.
I've dug through the Samza and Kafka codebases quite a bit over the last few days, and I'm at a loss about how to get Samza to hash partition indexes in a way that's compatible with other producers. I've tried implementing Samza's hashing algorithm in other clients (ie with [1]), but cannot for the life of me get equivalent partition calculations in a non-JVM language. Does anybody know a) if it's possible to define a custom key-to-partition hashing algorithm in Samza, or b) if there is a reliable general-purpose algorithm that can create the same results as Samza's algorithm? Cheers, Malcolm McFarland Cavulus [0] https://github.com/apache/samza/blob/1.7.0/samza-kafka/src/main/java/org/apache/samza/util/KafkaUtil.java#L47-L49 [1] https://stackoverflow.com/questions/40303333/how-to-replicate-java-hashcode-in-c-language