StephanEwen edited a comment on pull request #17520:
URL: https://github.com/apache/flink/pull/17520#issuecomment-970256639


   @tsreaper Thanks for doing the benchmark.
   
   I am curious to understand what the difference is between "bulk format + 
array list" and "stream format", because the "stream format" also puts 
deserialized records into an ArrayList. But something must be different, else 
there would not be such a big performance difference.
   
   Can we try and identify that, and maybe update the StreamFormatAdapter to be 
better?
   
   I would also be curious to understand where the performance difference with 
different block sizes come from in the StreamFormat. The stream format counts 
the batch size bytes after decompression, and it should be independent of 
Avro's blocks and sync markers, so I am puzzled why it has an impact.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to