StephanEwen commented on pull request #13501:
URL: https://github.com/apache/flink/pull/13501#issuecomment-702157253


   I took a look only at the partial record handling. I like the idea of not 
transporting the partial record offsets at all across the networks, which means 
we don't have to make the deserializers aware of that.
   
   Regarding the implementation, I have the same comment as @pnowojski . Given 
that the partial lengths fields are not communicated across the network, I feel 
that this logic does not belong into buffers or 
`BufferBuilder`/`BufferConsumer`. Those should handle purely bytes without any 
knowledge of records.
   
   I would suggest to change the `buffers` queue from holding `BufferConsumer` 
to holding `BufferConsumerWithOffsets` or so (maybe come up with a shorter 
name). That way we transport the same information, but not inject it into the 
buffers.
   
   Some more things to watch out for in further refactoring:
   
     - The *per record methods* of *ResultPartition* and its subclasses are 
highly performance sensitive. They should not become virtual methods. We 
currently can guarantee that because there is only one class implementing these 
per-record methods (BufferWritingResultPartition).
       When we start having different implementations, we ideally have only one 
of them loaded in the JVM at any point, so that the de-virtualization works 
properly. That means design the classes/subclasses such that a type of job 
(batch, streaming, streaming with partial failover) would only ever rely on one 
of these partition types in a job.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to