This is an automated email from the ASF dual-hosted git repository. xuanwo pushed a commit to branch writer-sink in repository https://gitbox.apache.org/repos/asf/incubator-opendal.git
commit 189705573d019516a975c61b4df14332b4fefe80 Author: Xuanwo <[email protected]> AuthorDate: Sun Apr 23 17:11:15 2023 +0800 rfc: Add writer sink api Signed-off-by: Xuanwo <[email protected]> --- core/src/docs/rfcs/0000_writer_sink_api.md | 106 +++++++++++++++++++++++++++++ 1 file changed, 106 insertions(+) diff --git a/core/src/docs/rfcs/0000_writer_sink_api.md b/core/src/docs/rfcs/0000_writer_sink_api.md new file mode 100644 index 00000000..155081f5 --- /dev/null +++ b/core/src/docs/rfcs/0000_writer_sink_api.md @@ -0,0 +1,106 @@ +- Proposal Name: `writer_sink_api` +- Start Date: 2023-04-23 +- RFC PR: [apache/incubator-opendal#0000](https://github.com/apache/incubator-opendal/pull/0000) +- Tracking Issue: [apache/incubator-opendal#0000](https://github.com/apache/incubator-opendal/issues/0000) + +# Summary + +Include a `sink` API within the `Writer` to enable streaming writing. + +# Motivation + +OpenDAL does not support streaming data uploads. Users must first load the data into memory and then send it to the `writer`. + +```rust +let bs = balabala(); +w.write(bs).await?; +let bs = daladala(); +w.write(bs).await?; +... +w.close().await?; +``` + +There are two main drawbacks to OpenDAL: + +- high memory usage, as reported in issue #1821 on GitHub +- low performance due to the need to buffer user data before sending it over the network. + +To address this issue, it would be beneficial for OpenDAL to provide an API that allows users to pass a stream or reader directly into the writer. + +# Guide-level explanation + +I propose to add the following API to `Writer`: + +```rust +impl Writer { + pub async fn copy_from<R>(&mut self, size: u64, r: R) -> Result<()> + where + R: futures::AsyncRead + Send + Sync + 'static; + + pub async fn pipe_from<S>(&mut self, size: u64, s: S) -> Result<()> + where + S: futures::TryStream + Send + Sync + 'static + Bytes: From<S::Ok>; +} +``` + +Users can now upload data in a streaming way: + +```rust +// Start writing the 5 TiB file. +let w = op.writer_with( + OpWrite::default() + .with_content_length(5 * 1024 * 1024 * 1024 * 1024)); + +let r = balabala(); +// Send to network directly without in-memory buffer. +w.copy_from(size, r).await?; +// repeat... +... + +// Close the write once we are ready! +w.close().await?; +``` + +The underlying services will handle this stream in the most efficient way possible. + +# Reference-level explanation + +To support `Wrtier::copy_from` and `Writer::pipe_from`, we will add a new API called `sink` inside `oio::Writer`: + +```rust +#[async_trait] +pub trait Write: Unpin + Send + Sync { + async fn sink(&mut self, size: u64, s: Box<dyn futures::TryStream<Ok=Bytes> + Send + Sync>) -> Result<()>; +} +``` + +OpenDAL converts the user input reader and stream into a byte stream for `oio::Write`. Services that support streaming upload natively can directly pass the stream. If not, they can use `write` repeatedly to write the entire stream. + +# Drawbacks + +None. + +# Rationale and alternatives + +## What's the different of `OpWrite::content_length` and `sink` size? + +The `OpWrite::content_length` parameter specifies the total length of the file to be written, while the `size` argument in the `sink` API indicates the size of the reader or stream provided. Certain services may optimize by writing all content in a single request if `content_length` is the same with given `size`. + +# Prior art + +None + +# Unresolved questions + +None + +# Future possibilities + +## Retry for the `sink` API + +It's impossible to retry the `sink` API itself, but we can provide a wrapper to retry the stream's call of `next`. If we met a retryable error, we can call `next` again by crate like `backon`. + +## Blocking support for sink + +We will add async support first.
