muralibasani commented on PR #21676: URL: https://github.com/apache/kafka/pull/21676#issuecomment-4033176498
> This PR does not really address the problem. It just shifts the deserialization we try to avoid to a different place. > > To really solve the problem, we need to change the `KafkaProducer` code, ie, the code that build the "record batches" which are sent to Kafka over the wire. For example, there is `DefaultRecord#writeTo` method, which iterates over all header to serializer them -- it's using a for loop, so it's implicitly calling `toArray()` which trigger the `materialize()` step introduces in this PR, just to serialize the records again... > > We need to change the _whole_ call stack, to be able to literally pass a `byte[]` array instead of a `Headers` object into the Producer (maybe via `ProducerRecord`?), and change all code which currently works on the `Headers` object, to consider the case that `Headers` would be `null`, and the new `byte[] rawHeaders` is present. Tried to make a few changes. They look complicated indeed. ProcessorContextImpl logChange and vector clock changes. Passed raw header bytes through the producer call stack .. from the changelog stores all the way down to DefaultRecord.writeTo(), so we never deserialize and re-serialize headers just to write them to the changelog topic. When the vector clock is enabled, we manually splice the new entries into the raw byte array instead of materializing a Headers object. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
