[jira] [Commented] (FLINK-26586) FileSystem uses unbuffered read I/O

Rui Fan (Jira) Sat, 30 Sep 2023 23:41:08 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-26586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17770815#comment-17770815
 ]


Rui Fan commented on FLINK-26586:
---------------------------------

Thanks [~masteryhx] [~Matthias Schwalbe]  for discussing here.
{quote}it's better if we could support to configure the buffer size.
{quote}
+1

 
{quote}my implementation is a façade to potentially all filesystem 
implementations, I think only the local filesystem implementation needs it, so 
we could also map to the Java buffered local I/O implementation instead of 
using java.io.FileInputStream

Some filesystems such as hdfs, s3 should also has its inner buffer 
implementation IIUC.
I am not sure whether just mapping to Java buffered stream could meet the 
requirement
{quote}
HadoopFileSystem has its buffer, however I found the performance isn't good. 
And you can get detailed analysis from these comments:
 * [https://github.com/apache/flink/pull/13885#issuecomment-723504501]
 * [https://github.com/apache/flink/pull/13885#issuecomment-724092762]

So it might be better to support buffers for all file systems, right?

> FileSystem uses unbuffered read I/O
> -----------------------------------
>
>                 Key: FLINK-26586
>                 URL: https://issues.apache.org/jira/browse/FLINK-26586
>             Project: Flink
>          Issue Type: Improvement
>          Components: API / State Processor, Connectors / FileSystem, Runtime 
> / Checkpointing
>    Affects Versions: 1.13.0, 1.14.0
>            Reporter: Matthias Schwalbe
>            Priority: Major
>         Attachments: BufferedFSDataInputStreamWrapper.java, 
> BufferedLocalFileSystem.java
>
>
> - I found out that, at least when using LocalFileSystem on a windows system, 
> read I/O to load a savepoint is unbuffered,
>  - See example stack [1]
>  - i.e. in order to load only a long in a serializer, it needs to go into 
> kernel mode 8 times and load the 8 bytes one by one
>  - I coded a BufferedFSDataInputStreamWrapper that allows to opt-in buffered 
> reads on any FileSystem implementation
>  - In our setting savepoint load is now 30 times faster
>  - I’ve once seen a Jira ticket as to improve savepoint load time in general 
> (lost the link unfortunately), maybe this approach can help with it
>  - not sure if HDFS has got the same problem
>  - I can contribute my implementation of a BufferedFSDataInputStreamWrapper 
> which can be integrated in any 
> [1] unbuffered reads stack:
> read:207, FileInputStream (java.io)
> read:68, LocalDataInputStream (org.apache.flink.core.fs.local)
> read:50, FSDataInputStreamWrapper (org.apache.flink.core.fs)
> read:42, ForwardingInputStream (org.apache.flink.runtime.util)
> readInt:390, DataInputStream (java.io)
> deserialize:80, BytePrimitiveArraySerializer 
> (org.apache.flink.api.common.typeutils.base.array)
> next:298, FullSnapshotRestoreOperation$KeyGroupEntriesIterator 
> (org.apache.flink.runtime.state.restore)
> next:273, FullSnapshotRestoreOperation$KeyGroupEntriesIterator 
> (org.apache.flink.runtime.state.restore)
> restoreKVStateData:147, RocksDBFullRestoreOperation 
> (org.apache.flink.contrib.streaming.state.restore)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-26586) FileSystem uses unbuffered read I/O

Reply via email to