[jira] [Commented] (FLINK-26586) FileSystem uses unbuffered read I/O

Anton Kalashnikov (Jira) Tue, 19 Apr 2022 03:02:19 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-26586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524217#comment-17524217
 ]


Anton Kalashnikov commented on FLINK-26586:
-------------------------------------------

According to the test, it is not a problem you can always look at 
LocalFilesystemTest or nearby tests.   These changes will be covered mainly by 
existing tests since it is not a new logic. What is more important here is to 
have the correct benchmark for this scenario 
(https://github.com/apache/flink-benchmarks) to be sure that our changes are 
not useless.

It is sad that you can not build Flink locally since it won't be easy to run 
the test. But it is still up to you how do you ready to take this task or not. 
You can try to prepare the PR(don't forget about our [contributor 
guide|https://flink.apache.org/contributing/how-to-contribute.html]) and then 
we can discuss further steps. But of course, it is not necessary that you take 
this task, maybe someone else will take it but in this case, it is not clear 
when it will happen(we need to think about priority).

> FileSystem uses unbuffered read I/O
> -----------------------------------
>
>                 Key: FLINK-26586
>                 URL: https://issues.apache.org/jira/browse/FLINK-26586
>             Project: Flink
>          Issue Type: Improvement
>          Components: API / State Processor, Connectors / FileSystem, Runtime 
> / Checkpointing
>    Affects Versions: 1.13.0, 1.14.0
>            Reporter: Matthias Schwalbe
>            Priority: Major
>         Attachments: BufferedFSDataInputStreamWrapper.java, 
> BufferedLocalFileSystem.java
>
>
> - I found out that, at least when using LocalFileSystem on a windows system, 
> read I/O to load a savepoint is unbuffered,
>  - See example stack [1]
>  - i.e. in order to load only a long in a serializer, it needs to go into 
> kernel mode 8 times and load the 8 bytes one by one
>  - I coded a BufferedFSDataInputStreamWrapper that allows to opt-in buffered 
> reads on any FileSystem implementation
>  - In our setting savepoint load is now 30 times faster
>  - I’ve once seen a Jira ticket as to improve savepoint load time in general 
> (lost the link unfortunately), maybe this approach can help with it
>  - not sure if HDFS has got the same problem
>  - I can contribute my implementation of a BufferedFSDataInputStreamWrapper 
> which can be integrated in any 
> [1] unbuffered reads stack:
> read:207, FileInputStream (java.io)
> read:68, LocalDataInputStream (org.apache.flink.core.fs.local)
> read:50, FSDataInputStreamWrapper (org.apache.flink.core.fs)
> read:42, ForwardingInputStream (org.apache.flink.runtime.util)
> readInt:390, DataInputStream (java.io)
> deserialize:80, BytePrimitiveArraySerializer 
> (org.apache.flink.api.common.typeutils.base.array)
> next:298, FullSnapshotRestoreOperation$KeyGroupEntriesIterator 
> (org.apache.flink.runtime.state.restore)
> next:273, FullSnapshotRestoreOperation$KeyGroupEntriesIterator 
> (org.apache.flink.runtime.state.restore)
> restoreKVStateData:147, RocksDBFullRestoreOperation 
> (org.apache.flink.contrib.streaming.state.restore)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (FLINK-26586) FileSystem uses unbuffered read I/O

Reply via email to