Zhangyx39 opened a new pull request #1338: SAMZA-2502: Byte array keys be 
partitioned based on array contents in…
URL: https://github.com/apache/samza/pull/1338
 
 
   Issue:
   InMemorySystemProducer uses the hashCode of the partition key to decide to 
which partition the message goes. This works well when the key is an object 
whose hashCode method can be override. But in the case when the partition key 
is serialized as a byte[], the message can go to any partition. It turns out 
that the hash code of a byte array is based on the address in memory but not 
the content. Therefore, even though two messages may have same key, they can be 
sent to different partitions after their keys are serialized into byte[] whose 
hash code is kind of random.
   
    
   Fix:
   We want to be able to partition messages based on the contents of the 
partition keys. An easy fix would be: in the case of byte array, we calculate 
the hash code with Arrays.hashCode(byte[] input). This allows us to calculate 
the hash code of the byte array by its contents.
   
   Test:
   Added a unit test in TestInMemorySystemProducer.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to