WenyXu commented on code in PR #3734:
URL: 
https://github.com/apache/incubator-opendal/pull/3734#discussion_r1421711547


##########
core/src/docs/rfcs/3734_buffered_reader.md:
##########
@@ -0,0 +1,113 @@
+- Proposal Name: `buffered_reader`
+- Start Date: 2023-12-10
+- RFC PR: 
[apache/incubator-opendal#3574](https://github.com/apache/incubator-opendal/pull/3734)
+- Tracking Issue: 
[apache/incubator-opendal#3575](https://github.com/apache/incubator-opendal/issues/3735)
+
+# Summary
+
+Amortize overhead of IO.
+
+# Motivation
+
+The aim is to amortize the overhead of IO. Some scenarios, such as importing 
data to the database, typically require supporting multiple file formats, and 
the size of files varies(very large or very small). Most readers of these 
formats may behave similarly to the `ParquetStream`; they read and decode data 
piece after piece.
+
+```
+File
+┌───┬───┬───┬──┐
+│   │   │   │  │
+└───┴───┴───┘▲─┘
+             │
+             │ 1. SeekFromEnd(8)
+
+               2. ReadToEnd(limit)
+```
+
+However, OpenDAL will yield an IO request after `seek()` and `read()` are 
invoked. For storage services such as S3, The 
[research](https://www.vldb.org/pvldb/vol16/p2769-durner.pdf) shows that the 
8MiB~16MiB is the preferred IO size, If the IO size is too small, the TTFB 
dominates the overall runtime; too large will reach the bandwidth limit.
+
+Therefore, This RFC proposes a buffered reader to amortize the overhead of IO.
+
+# Guide-level explanation
+
+For users who want to buffered reader, they will call the new API `buffer`. 
And the default behavior remains unchanged, so users using `op.reader_with()` 
are not affected. The `buffer` function will take a number as input, and this 
number will represent the maximum buffer capability the reader is able to use. 
+
+```rust
+op.reader_with(path).buffer(32 * 1024 * 1024).await
+```
+
+If a user prefers to customize the `Preferred IO Size`, it provides a 
`read_size` option (default is 8MiB):

Review Comment:
   Maybe let the `Accessor`'s metadata provide the default value.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to