StephanEwen commented on pull request #13501:
URL: https://github.com/apache/flink/pull/13501#issuecomment-702157253
I took a look only at the partial record handling. I like the idea of not
transporting the partial record offsets at all across the networks, which means
we don't have to make the deserializers aware of that.
Regarding the implementation, I have the same comment as @pnowojski . Given
that the partial lengths fields are not communicated across the network, I feel
that this logic does not belong into buffers or
`BufferBuilder`/`BufferConsumer`. Those should handle purely bytes without any
knowledge of records.
I would suggest to change the `buffers` queue from holding `BufferConsumer`
to holding `BufferConsumerWithOffsets` or so (maybe come up with a shorter
name). That way we transport the same information, but not inject it into the
buffers.
Some more things to watch out for in further refactoring:
- The *per record methods* of *ResultPartition* and its subclasses are
highly performance sensitive. They should not become virtual methods. We
currently can guarantee that because there is only one class implementing these
per-record methods (BufferWritingResultPartition).
When we start having different implementations, we ideally have only one
of them loaded in the JVM at any point, so that the de-virtualization works
properly. That means design the classes/subclasses such that a type of job
(batch, streaming, streaming with partial failover) would only ever rely on one
of these partition types in a job.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]