1996fanrui commented on pull request #13885: URL: https://github.com/apache/flink/pull/13885#issuecomment-731590386
> Yes, I think that makes sense. Adding a `fs.buffer` config option, maybe with a default value of 64k? > > * For local streams, it would be good as well, that is true. > * For S3, we get that implicitly if we buffer the `HadoopDataInputStream`, because the S3 reads go through some Hadoop utilities and also wrap the stream with the `HadoopDataInputStream` in the end. @StephanEwen Thanks for your reply. After the above analysis, I think `FSDataBufferedInputStream` can wrap all InputStream. Now there is a question: we do not have same interfaces for checkpoint read of state backend. How to pass `buffer size`? There are three options: 1. Wait for the completion of [FLINK-19465](https://issues.apache.org/jira/browse/FLINK-19465), and then continue the current PR work 2. The current PR does not support modification of `buffer size`, and is set to 64KB uniformly. 3. Use three duplicate `getFsReadBufferSize` to complete the current PR, and then refactor in [FLINK-19465](https://issues.apache.org/jira/browse/FLINK-19465), or create a new issue to refactor. what do you think of this? @Myasuka @sjwiesman ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
