StephanEwen commented on pull request #13920:
URL: https://github.com/apache/flink/pull/13920#issuecomment-723189562


   Sorry to be late to the game here, but could you share a bit more 
information on what the original setup was?
   Specifically, what was your checkpoint storage system that offered such bad 
stream read performance? Was it HDFS? OSS? S3?
   
   Looking at this change here, it seems very big (40 files) for "just" 
introducing a buffer in a stream.
   So I tend to be -1 on the change as it is.
   
   Two other options to solve this:
   
   (1) Input stream buffering is a property of the `CheckpointStorage`. It is 
created there, rather than in the state backends that have to wrap the stream.
   
   (2) Alternatively, we can make it a contract that all FileSystem 
implementations return well buffered streams. Some already do this by default, 
wrapping them with another buffered stream adds just another layer and extra 
copying of bytes, costing performance.
   
   At a first glance, I'd say let's go with option (2) if possible, otherwise 
option (1).
   Hence the question: Which FS did you use that had such bad performing 
streams?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to