Kishore Nallan created SAMZA-839:
------------------------------------

             Summary: KafkaSystemProducer should use the same partitioning hash 
function as Kafka's producer
                 Key: SAMZA-839
                 URL: https://issues.apache.org/jira/browse/SAMZA-839
             Project: Samza
          Issue Type: Bug
          Components: kafka
    Affects Versions: 0.9.1
            Reporter: Kishore Nallan


Samza's KafkaSystemProducer class generates the partition key using:

abs(envelope.getPartitionKey.hashCode()) % numPartitions

However, Kafka's producer generates the partition key this way:

Utils.abs(Utils.murmur2(record.key())) % numPartitions

This makes it difficult for me to join 2 data sources on a common key when one 
source is produced by Samza and the other by a default Kafka producer.

As a work-around, I have to modify the upstream job (that uses the default 
kafka producer) to write with an explicit partition key using Samza's hashing 
logic. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to