[ https://issues.apache.org/jira/browse/KAFKA-17825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17909000#comment-17909000 ]
Matthias J. Sax commented on KAFKA-17825: ----------------------------------------- I dug into this, and I am actually not sure if it's indeed a bug – my understanding is, that the KIP intentionally changes the contract, what means it's not a backward compatible change, which the KIP does not clearly call out though IMHO. The sentence {quote}If someone wants the deserializer to be compatible with older versions of the kafka-clients library they should always implement the byte array-based deserialize methods. {quote} {color:#172b4d}is rather cryptic.{color} {color:#172b4d}Overall, the contract of `ByteBufferDeserializer` is _not_ clearly specified from my POV.{color} {color:#172b4d}Prior to the KIP, we did make a deep copy, and thus using `array()` was "safe" in the sense that one does not need to care about arrayOffset,position,limit... Eg, a ByteBuffer could be converted into a String via `new String(byteBuffer.array())`.{color}{color:#172b4d} With this KIP, using `array()` is not "safe" any longer though and one would need to use `{color}{color:#172b4d}new String(buffer.array(), buffer.arrayOffset() + buffer.position(), buffer.remaining())` instead.{color} {color:#172b4d}However, it's not really clear what the contract is (not in 3.5, nor in 3.6). I assume that the statement in the KIP was supposed to mean, if you want a deep-copy, use the old `deserialize(String topic, byte[] data)` but that is not really practical, because if I use the `KafkaConsumer` I don't control which method the consumer calls, and it will in fact call the new `deserialize(String topic, Headers headers, ByteBuffer data)` so as a user I cannot force a deep copy. (User code does usually not call `deserialize()` themselves...){color} {color:#172b4d}In the end, `ByterBufferDeserializer` is not an interface and thus "should always implement the byte array-based deserialize methods" does no apply – the user does not implement anything when using this class... They just plug it into the KafkaConsumer.{color} {color:#172b4d}In the end, the goal of the KIP was to avoid a deep-copy, but it did introduce a breaking change... This ship has sailed so the only thing we can do is to actually update the KIP, docs, and JavaDocs IMHO. Of course, we could also fall back to the old behavior, but this seems to defeat the purpose, and people can still force a deep-copy of the data by using `ByteArrayDeserializer` instead of `ByteBufferDeserializer`. I agree to the idea that `ByteBufferDeserializer` is more advanced and thus should avoid a deep-copy, and users need to understand how to use it correctly.{color} {color:#172b4d}Thoughts? \cc [~LSK] [~pnee] (also [~guozhang] [~ChrisEgerton] [~showuon] who reviews and vote the KIP).{color} > ByteBufferDeserializaer's array size can be inconsistent with the older > version > ------------------------------------------------------------------------------- > > Key: KAFKA-17825 > URL: https://issues.apache.org/jira/browse/KAFKA-17825 > Project: Kafka > Issue Type: Bug > Components: consumer > Affects Versions: 3.6.0 > Reporter: Philip Nee > Assignee: LinShunkang > Priority: Blocker > Fix For: 4.0.0, 3.9.1, 3.8.2 > > > We've noticed that using the ByteBufferDeserializer can yield a different > byte array length compare to the deserializer from 3.5.2. This is attributed > by KIP-863, in particular, the old deserializer truncated the byte array > starting from > `buffer.position() + buffer.arrayOffset() + offset` using `Utils.toArray` > > Whereas the current implementation is a passthrough. > > This can be reproduced using the > {code:java} > KafkaConsumerProducerDemo.java{code} > by changing the type to <Integer, ByteBuffer> and perform a print after poll. > For example, the producer produces a record [0, test0] (key is an int, > "test0" is a 5 bytes long string, converted to byte buffer using > {code:java} > ByteBuffer.wrap(value.getBytes()){code} > Prior to KIP-863 we see the following after polling the record from consumer: > 3.5.2: test0 > 3.6.0: > {code:java} > ?$���y�NNޅ�-p�=�����NAc�����8D���8D��������������� > test0{code} > > And if you analyze the ByteBuffer post 3.6.0, we can see the current offset > is at 140 with array length of 149. > > [~LSK] - since you wrote the kip and did the implementation, can you address > this ? -- This message was sent by Atlassian Jira (v8.20.10#820010)