[GitHub] [flink] 1996fanrui commented on pull request #13885: [FLINK-19911] Read checkpoint stream with buffer to speedup restore

GitBox Sat, 21 Nov 2020 06:58:29 -0800


1996fanrui commented on pull request #13885:
URL: https://github.com/apache/flink/pull/13885#issuecomment-731590386



   > Yes, I think that makes sense. Adding a `fs.buffer` config option, maybe 
with a default value of 64k?
   > 
   > * For local streams, it would be good as well, that is true.
   > * For S3, we get that implicitly if we buffer the `HadoopDataInputStream`, 
because the S3 reads go through some Hadoop utilities and also wrap the stream 
with the `HadoopDataInputStream` in the end.
   
   @StephanEwen Thanks for your reply.
   After the above analysis, I think `FSDataBufferedInputStream` can wrap all 
InputStream.
   Now there is a question: we do not have same interfaces for checkpoint read 
of state backend. How to pass `buffer size`?
   
   There are three options:
   1. Wait for the completion of 
[FLINK-19465](https://issues.apache.org/jira/browse/FLINK-19465), and then 
continue the current PR work
   2. The current PR does not support modification of `buffer size`, and is set 
to 64KB uniformly.
   3. Use three duplicate `getFsReadBufferSize` to complete the current PR, and 
then refactor in 
[FLINK-19465](https://issues.apache.org/jira/browse/FLINK-19465), or create a 
new issue to refactor.
   
   what do you think of this? @Myasuka @sjwiesman 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink] 1996fanrui commented on pull request #13885: [FLINK-19911] Read checkpoint stream with buffer to speedup restore

Reply via email to