Nicolas created KAFKA-15042:
-------------------------------
Summary: Clarify documentation on Tagged fields
Key: KAFKA-15042
URL: https://issues.apache.org/jira/browse/KAFKA-15042
Project: Kafka
Issue Type: Wish
Components: docs
Affects Versions: 3.1.1
Environment: Using the Ubuntu/Kafka Docker image for testing purposes.
Reporter: Nicolas
Hello,
I am currently working on an implementation of the Kafka protocol.
So far, all my code is working as intended through serialising requests and
deserialising response as long as I am not using the flex requests system.
I am now trying to implement the flex requests system but the documentation is
scarce on the subject of tagged fields.
If we take the Request Header v2:
{code:java}
Request Header v2 => request_api_key request_api_version correlation_id
client_id TAG_BUFFER request_api_key => INT16 request_api_version => INT16
correlation_id => INT32 client_id => NULLABLE_STRING{code}
Here, the BNF seems violated. TAG_BUFFER is not a value in this situation. It
appears to be a type. It also does not appear within the detailed description
inside the BNF.
TAG_BUFFER also does not refer to any declared type within the documentation.
It seems to indicate tagged fields though.
Now when looking at tagged fields, the only mention of them within the
documentation is:
{quote}Note that [KIP-482 tagged
fields|https://cwiki.apache.org/confluence/display/KAFKA/KIP-482%3A+The+Kafka+Protocol+should+Support+Optional+Tagged+Fields]
can be added to a request without incrementing the version number. This offers
an additional way of evolving the message schema without breaking
compatibility. Tagged fields do not take up any space when the field is not
set. Therefore, if a field is rarely used, it is more efficient to make it a
tagged field than to put it in the mandatory schema. However, tagged fields are
ignored by recipients that don't know about them, which could pose a challenge
if this is not the behavior that the sender wants. In such cases, a version
bump may be more appropriate.
{quote}
This leads to the KIP-482 that does not clearly and explicitly detail the
process of writing and reading those tagged fields.
I decided to look up existing clients to understand how they handle tagged
fields. I notable looked at kafka-js (JavaScript client) and librdkafka and to
my surprise, they have not implemented tagged fields. In fact, they rely on a
hack to skip them and ignore them completely.
I also had a look at the[ Java client bundled within Kafka within the Tagged
Fields
section|https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/common/protocol/types/TaggedFields.java#L64].
Now I am not a Java developer so I may not understand exactly this code. I read
the comment and implemented the logic following Google's Protobuf
specifications. The problem is, this leads to a request that outputs a stack
trace within Kafka (it would be appreciated to not just dump stack traces and
gracefully handle errors by the way).
As a reference, I tried to send an APIVersions (key: 18) (version: 3) request.
My request reads as follows when converted to hexadecimal:
{code:java}
Request header: 00 12 00 03 00 00 00 00 00 07 77 65 62 2d 61 70 69 00
Request body: 01 08 77 65 62 2d 61 70 69 01 06 30 2e 30 2e 31 00
Full request: 00 00 00 23 00 12 00 03 00 00 00 00 00 07 77 65 62 2d 61 70 69 00
01 08 77 65 62 2d 61 70 69 01 06 30 2e 30 2e 31 00
{code}
This creates a buffer underflow error within Kafka:
{code:java}
[2023-05-31 14:14:31,132] ERROR Exception while processing request from
172.21.0.5:9092-172.21.0.3:59228-21 (kafka.network.Processor)
org.apache.kafka.common.errors.InvalidRequestException: Error getting request
for apiKey: API_VERSIONS, apiVersion: 3, connectionId:
172.21.0.5:9092-172.21.0.3:59228-21, listenerName: ListenerName(PLAINTEXT),
principal: User:ANONYMOUS
Caused by: java.nio.BufferUnderflowException
at java.base/java.nio.HeapByteBuffer.get(HeapByteBuffer.java:182)
at java.base/java.nio.ByteBuffer.get(ByteBuffer.java:770)
at
org.apache.kafka.common.protocol.ByteBufferAccessor.readArray(ByteBufferAccessor.java:58)
at
org.apache.kafka.common.protocol.Readable.readUnknownTaggedField(Readable.java:53)
at
org.apache.kafka.common.message.ApiVersionsRequestData.read(ApiVersionsRequestData.java:133)
at
org.apache.kafka.common.message.ApiVersionsRequestData.<init>(ApiVersionsRequestData.java:74)
at
org.apache.kafka.common.requests.ApiVersionsRequest.parse(ApiVersionsRequest.java:119)
at
org.apache.kafka.common.requests.AbstractRequest.doParseRequest(AbstractRequest.java:207)
at
org.apache.kafka.common.requests.AbstractRequest.parseRequest(AbstractRequest.java:165)
at
org.apache.kafka.common.requests.RequestContext.parseRequest(RequestContext.java:95)
at kafka.network.RequestChannel$Request.<init>(RequestChannel.scala:101)
at
kafka.network.Processor.$anonfun$processCompletedReceives$1(SocketServer.scala:1030)
at
java.base/java.util.LinkedHashMap$LinkedValues.forEach(LinkedHashMap.java:608)
at
kafka.network.Processor.processCompletedReceives(SocketServer.scala:1008)
at kafka.network.Processor.run(SocketServer.scala:893)
at java.base/java.lang.Thread.run(Thread.java:829){code}
I attempted many different approaches but I have to admit that I am making
absolutely no progress.
I am creating this issue for two reasons:
* Obviously, I would very much appreciate an explanation on how to write and
read tagged fields with a detailed approach. What kind of bytes are expected?
What are the values format?
* Since mainstream clients aren't implementing this feature, it is wasted on
most users. Let's face it, very few people actually implement the binary
protocol. It is sad that this feature is not widely available to everyone,
especially since it can reduce requests loads. I think improving the
documentation to make the tagged fields clearly explained with at least one
example would greatly benefit the community.
Thanks in advance for your answers!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)