StephanEwen commented on pull request #13920: URL: https://github.com/apache/flink/pull/13920#issuecomment-723189562
Sorry to be late to the game here, but could you share a bit more information on what the original setup was? Specifically, what was your checkpoint storage system that offered such bad stream read performance? Was it HDFS? OSS? S3? Looking at this change here, it seems very big (40 files) for "just" introducing a buffer in a stream. So I tend to be -1 on the change as it is. Two other options to solve this: (1) Input stream buffering is a property of the `CheckpointStorage`. It is created there, rather than in the state backends that have to wrap the stream. (2) Alternatively, we can make it a contract that all FileSystem implementations return well buffered streams. Some already do this by default, wrapping them with another buffered stream adds just another layer and extra copying of bytes, costing performance. At a first glance, I'd say let's go with option (2) if possible, otherwise option (1). Hence the question: Which FS did you use that had such bad performing streams? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
