Thanks for your reviews @NicoK Sorry for the late updates with this PR because I am a little busy recently, also regarding with the benchmark results.
For my own broadcast benchmark, this changes gain obvious improvement. But for non-broadcast cases, the throughput of `StreamNetworkThroughputBenchmarkExecutor` seems a bit decreased than before. After I adjusted to keep the same process of `pruneBuffer()` as before, the results seem a bit better than current, but still has a bit decrease (1% sometimes) than before. So I guess another reason is in the past the `RecordSerializer` will maintain the `BufferBuilder` internally and keep copying multi serialization results until full. But now for each record we have to get the `BufferBuilder` from the arrays in `RecordWriter` then pass it to the `RecordSerializer`. And this is the key difference and overhead because the `RecordSerializer` is stateless. So I am still trying to improve other parts to compensate this loss. I am trying to update this PR soon based on all the above comments! [ Full content available at: https://github.com/apache/flink/pull/6417 ] This message was relayed via gitbox.apache.org for [email protected]
