muralibasani commented on PR #21676:
URL: https://github.com/apache/kafka/pull/21676#issuecomment-4033176498

   > This PR does not really address the problem. It just shifts the 
deserialization we try to avoid to a different place.
   > 
   > To really solve the problem, we need to change the `KafkaProducer` code, 
ie, the code that build the "record batches" which are sent to Kafka over the 
wire. For example, there is `DefaultRecord#writeTo` method, which iterates over 
all header to serializer them -- it's using a for loop, so it's implicitly 
calling `toArray()` which trigger the `materialize()` step introduces in this 
PR, just to serialize the records again...
   > 
   > We need to change the _whole_ call stack, to be able to literally pass a 
`byte[]` array instead of a `Headers` object into the Producer (maybe via 
`ProducerRecord`?), and change all code which currently works on the `Headers` 
object, to consider the case that `Headers` would be `null`, and the new 
`byte[] rawHeaders` is present.
   
   Tried to make a few changes. They look complicated indeed. 
ProcessorContextImpl logChange and vector clock changes.
   
   Passed raw header bytes through the producer call stack .. from the 
changelog stores all the way down to DefaultRecord.writeTo(), so we never 
deserialize and re-serialize headers just to write them to the changelog topic. 
   
   When the vector clock is enabled, we manually splice the new entries into 
the raw byte array instead of materializing a Headers object.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to