1996fanrui commented on pull request #13885: URL: https://github.com/apache/flink/pull/13885#issuecomment-723377721
> Sorry to be late to the game here, but could you share a bit more information on what the original setup was? > Specifically, what was your checkpoint storage system that offered such bad stream read performance? Was it HDFS? OSS? S3? > > Looking at this change here, it seems very big with more options and many changed classes, for "just" introducing a buffer in a stream. That makes me skeptical that this is fixed in the right place. > > Two other options to solve this: > > (1) Input stream buffering is a property of the `CheckpointStorage`. It is created there, rather than in the state backends that have to wrap the stream. That way it works for all users of the `CheckpointStorage`, not just the state backends that happened to be adjusted to wrap the stream. > > (2) Alternatively, we can make it a contract that all FileSystem implementations return well-buffered streams. Some already do this by default, wrapping them with another buffered stream adds just another layer and extra copying of bytes, costing performance. The ones that do not do that are easily adjusted. > > At a first glance, I'd say let's go with option (2) if possible, otherwise option (1). > Hence the question: Which FS did you use that had such bad performing streams? Thanks for your comments. I use hdfs, mainly because there are a lot of small IO during restore. A Flink job is provided in jira to reproduce the test results. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
