Steven Schlansker created KAFKA-19877:
-----------------------------------------

             Summary: Clarify Deserializer contract around ByteBuffer position, 
limit
                 Key: KAFKA-19877
                 URL: https://issues.apache.org/jira/browse/KAFKA-19877
             Project: Kafka
          Issue Type: Improvement
          Components: clients
    Affects Versions: 4.1.0
            Reporter: Steven Schlansker


We use custom Serde implementations. We improved our Serde to avoid taking 
{{byte[]}} copies by using the {{deserialize(String topic, Headers headers, 
ByteBuffer data)}} overload. This ended up introducing a serious bug into a 
distant module that relied on {{serializedValueSize}} on the consumer record.

Per the {{Deserializer}} documentation, we were careful not to assume anything 
about the capacity etc.

However, I think there are some logical gaps in the contract as currently 
written.

Specifically, the deserialize call "cannot make any assumptions about the 
returned ByteBuffer like the position". The contract does not say what you 
*can* assume - should you assume the data starts at position 0? This seems to 
be the case but is not explicitly stated.

Relatedly, since you cannot assume the {{limit}} means anything, that means all 
deserializers must have some kind of internal framing or is otherwise able to 
safely and cheaply ignore extra data past the end of your structure. Kafka 
itself violates this contract - the default implementation calls 
{{Utils.toNullableArray}} which calls {{buffer.remaining() }} ({{{}limit - 
position{}}}) to understand how many bytes are left because what else can you 
do, without another level of framing?

I think this would be better if the deserializer either stated to always start 
at position 0, or always provide the buffer positioned to the start of the 
record.  If we really cannot rely on {{{}limit{}}}, then the current call to 
{{toNullableArray}} is wrong, and I am not sure how to fix it - so I think 
updating the spec to state that {{position}} and {{limit}} can be relied on to 
frame the data is the best fix here.

Additionally, the contract does not say anything about the {{position}} of the 
buffer *after* the deserializer is done with it. However, this turns out to be 
very important in calculating the {{serializedValueSize}} - if your 
deserializer advances the position of the buffer to read the data (which is 
*really* easy to do with {{{}ByteBuffer{}}}), that takes away from the 
serialized size, often leaving it as 0 (all bytes consumed by deserialization). 
Either the {{CompletedFetch}} and {{ShareCompletedFetch}} implementations 
should be both be updated to use {{limit}} instead of {{{}remaining{}}}, or the 
{{Deserializer}} should make it very clear that you must not move the 
{{position}} pointer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to