[ 
https://issues.apache.org/jira/browse/KAFKA-17825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17909000#comment-17909000
 ] 

Matthias J. Sax commented on KAFKA-17825:
-----------------------------------------

I dug into this, and I am actually not sure if it's indeed a bug – my 
understanding is, that the KIP intentionally changes the contract, what means 
it's not a backward compatible change, which the KIP does not clearly call out 
though IMHO.

The sentence
{quote}If someone wants the deserializer to be compatible with older versions 
of the kafka-clients library they should always implement the byte array-based 
deserialize methods.
{quote}
{color:#172b4d}is rather cryptic.{color}

{color:#172b4d}Overall, the contract of `ByteBufferDeserializer` is _not_ 
clearly specified from my POV.{color}

{color:#172b4d}Prior to the KIP, we did make a deep copy, and thus using 
`array()` was "safe" in the sense that one does not need to care about 
arrayOffset,position,limit... Eg, a ByteBuffer could be converted into a String 
via `new String(byteBuffer.array())`.{color}{color:#172b4d} With this KIP, 
using `array()` is not "safe" any longer though and one would need to use 
`{color}{color:#172b4d}new String(buffer.array(), buffer.arrayOffset() + 
buffer.position(), buffer.remaining())` instead.{color}

{color:#172b4d}However, it's not really clear what the contract is (not in 3.5, 
nor in 3.6). I assume that the statement in the KIP was supposed to mean, if 
you want a deep-copy, use the old `deserialize(String topic, byte[] data)` but 
that is not really practical, because if I use the `KafkaConsumer` I don't 
control which method the consumer calls, and it will in fact call the new 
`deserialize(String topic, Headers headers, ByteBuffer data)` so as a user I 
cannot force a deep copy. (User code does usually not call `deserialize()` 
themselves...){color}

{color:#172b4d}In the end, `ByterBufferDeserializer` is not an interface and 
thus "should always implement the byte array-based deserialize methods" does no 
apply – the user does not implement anything when using this class... They just 
plug it into the KafkaConsumer.{color}

{color:#172b4d}In the end, the goal of the KIP was to avoid a deep-copy, but it 
did introduce a breaking change... This ship has sailed so the only thing we 
can do is to actually update the KIP, docs, and JavaDocs IMHO. Of course, we 
could also fall back to the old behavior, but this seems to defeat the purpose, 
and people can still force a deep-copy of the data by using 
`ByteArrayDeserializer` instead of `ByteBufferDeserializer`. I agree to the 
idea that `ByteBufferDeserializer` is more advanced and thus should avoid a 
deep-copy, and users need to understand how to use it correctly.{color}

{color:#172b4d}Thoughts? \cc [~LSK] [~pnee] (also [~guozhang] [~ChrisEgerton] 
[~showuon] who reviews and vote the KIP).{color}

> ByteBufferDeserializaer's array size can be inconsistent with the older 
> version
> -------------------------------------------------------------------------------
>
>                 Key: KAFKA-17825
>                 URL: https://issues.apache.org/jira/browse/KAFKA-17825
>             Project: Kafka
>          Issue Type: Bug
>          Components: consumer
>    Affects Versions: 3.6.0
>            Reporter: Philip Nee
>            Assignee: LinShunkang
>            Priority: Blocker
>             Fix For: 4.0.0, 3.9.1, 3.8.2
>
>
> We've noticed that using the ByteBufferDeserializer can yield a different 
> byte array length compare to the deserializer from 3.5.2.  This is attributed 
> by KIP-863, in particular, the old deserializer truncated the byte array 
> starting from
> `buffer.position() + buffer.arrayOffset() + offset` using `Utils.toArray`
>  
> Whereas the current implementation is a passthrough.
>  
> This can be reproduced using the
> {code:java}
> KafkaConsumerProducerDemo.java{code}
>  by changing the type to <Integer, ByteBuffer> and perform a print after poll.
> For example, the producer produces a record  [0, test0] (key is an int, 
> "test0" is a 5 bytes long string, converted to byte buffer using 
> {code:java}
> ByteBuffer.wrap(value.getBytes()){code}
> Prior to KIP-863 we see the following after polling the record from consumer:
> 3.5.2: test0
> 3.6.0: 
> {code:java}
> ?$���y�NNޅ�-p�=�����NAc�����8D���8D���������������
> test0{code}
>  
> And if you analyze the ByteBuffer post 3.6.0, we can see the current offset 
> is at 140 with array length of 149.
>  
> [~LSK] - since you wrote the kip and did the implementation, can you address 
> this ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to