[ 
https://issues.apache.org/jira/browse/KAFKA-3499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15235799#comment-15235799
 ] 

Michael Noll edited comment on KAFKA-3499 at 4/11/16 7:35 PM:
--------------------------------------------------------------

FWIW, we ran into the same problem when handling byte[] in Twitter Algebird.  
Back then I introduced a custom 
[Bytes|https://github.com/twitter/algebird/blob/develop/algebird-core/src/main/scala/com/twitter/algebird/Bytes.scala]
 wrapper for byte arrays ([original 
PR|https://github.com/twitter/algebird/pull/399/files]), which happens to use 
java.nio.ByteBuffer.  The Bytes.scala code might be a good starting point; it 
includes a sane implementations of hashCode, ordering/compare, equals, etc.

Note that, by design (performance reasons), this wrapper is not enforcing 
immutability.  See the javadocs in the source link above for details.


was (Author: miguno):
FWIW, we ran into the same problem when handling byte[] in Twitter Algebird.  
Back then I introduced a custom 
[Bytes|https://github.com/twitter/algebird/blob/develop/algebird-core/src/main/scala/com/twitter/algebird/Bytes.scala]
 wrapper for byte arrays ([original 
PR|https://github.com/twitter/algebird/pull/399/files]), which happens to use 
java.nio.ByteBuffer.  The Bytes.scala code might be a good starting point; it 
includes a sane implementations of hashCode, ordering/compare, equals, etc.

> byte[] should not be used as Map key nor Set member
> ---------------------------------------------------
>
>                 Key: KAFKA-3499
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3499
>             Project: Kafka
>          Issue Type: Sub-task
>          Components: streams
>            Reporter: josh gruenberg
>              Labels: user-experience
>             Fix For: 0.10.0.0
>
>
> On the JVM, Array.equals and Array.hashCode do not incorporate array 
> contents; they inherit Object.equals/hashCode. This implies that Collections 
> that rely upon equals/hashCode (eg, HashMap/HashSet and variants) treat two 
> arrays with equal contents as distinct elements.
> Many of the Kafka Streams internal classes currently use generic HashMaps and 
> Sets to manage caches and invalidation status. For example, 
> RocksDBStore.cacheDirtyKeys is a HashSet<K>. Then, in RocksDBWindowStore, the 
> Elements are constructed as RocksDBStore<byte[], byte[]>.
> Similarly, the MemoryLRUCache<K, RocksDBCacheEntry> internally holds a 
> LinkedHashMap<K,V> map, and a HashSet<K> keys, and these end up holding 
> byte[] keys. Finally, user-code may attempt to use any of these provided 
> types with byte[], with undesirable results.
> Keys that are byte-arrays should be wrapped in a type that incorporates the 
> content in their computation of equals/hashCode. java.nio.ByteBuffer is one 
> such type that could be used, but a purpose-built immutable class would 
> likely be a better solution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to