[
https://issues.apache.org/jira/browse/SAMZA-839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16618968#comment-16618968
]
Voronenko Dmitriy commented on SAMZA-839:
-----------------------------------------
This problem is described in detail on this page :
[http://www.agardner.me/kafka/big/data/partitioner/java/scala/byte/array/2016/01/23/kafka-partitioning.html]
You did well to support the replacement of the producer. But it's bad that the
pre-processing logic breaks everything.
> KafkaSystemProducer should use the same partitioning hash function as Kafka's
> producer
> --------------------------------------------------------------------------------------
>
> Key: SAMZA-839
> URL: https://issues.apache.org/jira/browse/SAMZA-839
> Project: Samza
> Issue Type: Bug
> Components: kafka
> Affects Versions: 0.9.1
> Reporter: Kishore Nallan
> Assignee: Dong Lin
> Priority: Major
>
> Samza's KafkaSystemProducer class generates the partition key using:
> {{abs(envelope.getPartitionKey.hashCode()) % numPartitions}}
> However, Kafka's producer generates the partition key this way:
> {{Utils.abs(Utils.murmur2(record.key())) % numPartitions}}
> This makes it difficult for me to join 2 data sources on a common key when
> one source is produced by Samza and the other by a default Kafka producer.
> As a work-around, I have to modify the upstream job (that uses the default
> kafka producer) to write with an explicit partition key using Samza's hashing
> logic.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)