Steven Schlansker created KAFKA-19877:
-----------------------------------------
Summary: Clarify Deserializer contract around ByteBuffer position,
limit
Key: KAFKA-19877
URL: https://issues.apache.org/jira/browse/KAFKA-19877
Project: Kafka
Issue Type: Improvement
Components: clients
Affects Versions: 4.1.0
Reporter: Steven Schlansker
We use custom Serde implementations. We improved our Serde to avoid taking
{{byte[]}} copies by using the {{deserialize(String topic, Headers headers,
ByteBuffer data)}} overload. This ended up introducing a serious bug into a
distant module that relied on {{serializedValueSize}} on the consumer record.
Per the {{Deserializer}} documentation, we were careful not to assume anything
about the capacity etc.
However, I think there are some logical gaps in the contract as currently
written.
Specifically, the deserialize call "cannot make any assumptions about the
returned ByteBuffer like the position". The contract does not say what you
*can* assume - should you assume the data starts at position 0? This seems to
be the case but is not explicitly stated.
Relatedly, since you cannot assume the {{limit}} means anything, that means all
deserializers must have some kind of internal framing or is otherwise able to
safely and cheaply ignore extra data past the end of your structure. Kafka
itself violates this contract - the default implementation calls
{{Utils.toNullableArray}} which calls {{buffer.remaining() }} ({{{}limit -
position{}}}) to understand how many bytes are left because what else can you
do, without another level of framing?
I think this would be better if the deserializer either stated to always start
at position 0, or always provide the buffer positioned to the start of the
record. If we really cannot rely on {{{}limit{}}}, then the current call to
{{toNullableArray}} is wrong, and I am not sure how to fix it - so I think
updating the spec to state that {{position}} and {{limit}} can be relied on to
frame the data is the best fix here.
Additionally, the contract does not say anything about the {{position}} of the
buffer *after* the deserializer is done with it. However, this turns out to be
very important in calculating the {{serializedValueSize}} - if your
deserializer advances the position of the buffer to read the data (which is
*really* easy to do with {{{}ByteBuffer{}}}), that takes away from the
serialized size, often leaving it as 0 (all bytes consumed by deserialization).
Either the {{CompletedFetch}} and {{ShareCompletedFetch}} implementations
should be both be updated to use {{limit}} instead of {{{}remaining{}}}, or the
{{Deserializer}} should make it very clear that you must not move the
{{position}} pointer.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)