This is an automated email from the ASF dual-hosted git repository.
xuanwo pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/incubator-opendal.git
The following commit(s) were added to refs/heads/main by this push:
new 33829f47a6 RFC-3898: Concurrent Writer (#3898)
33829f47a6 is described below
commit 33829f47a6e37f0ffdde67ae9c66a2679a87a935
Author: Weny Xu <[email protected]>
AuthorDate: Thu Jan 4 15:43:02 2024 +0900
RFC-3898: Concurrent Writer (#3898)
* feat: add concurrent writer rfc
* chore: apply suggestions from CR
* chore: apply suggestions from CR
---
core/src/docs/rfcs/3898_concurrent_writer.md | 66 ++++++++++++++++++++++++++++
core/src/docs/rfcs/mod.rs | 3 ++
2 files changed, 69 insertions(+)
diff --git a/core/src/docs/rfcs/3898_concurrent_writer.md
b/core/src/docs/rfcs/3898_concurrent_writer.md
new file mode 100644
index 0000000000..d95106efcf
--- /dev/null
+++ b/core/src/docs/rfcs/3898_concurrent_writer.md
@@ -0,0 +1,66 @@
+- Proposal Name: `concurrent_writer`
+- Start Date: 2024-01-02
+- RFC PR:
[apache/incubator-opendal#3898](https://github.com/apache/incubator-opendal/pull/3898)
+- Tracking Issue:
[apache/incubator-opendal#3899](https://github.com/apache/incubator-opendal/issues/3899)
+
+# Summary
+
+Enhance the `Writer` by adding concurrent write capabilities.
+
+# Motivation
+
+Certain services, such as S3, GCS, and AzBlob, offer the `multi_write`
functionality, allowing users to perform multiple write operations for
uploading of large files. If a service support `multi_write`, the
[Capability::write_can_multi](https://opendal.apache.org/docs/rust/opendal/struct.Capability.html#structfield.write_can_multi)
metadata should be set to `true`.
+```rust
+ let mut writer = op.writer("path/to").await?; // a writers supports the
`multi_write`.
+ writer.write(part0).await?;
+ writer.write(part1).await?; // It starts to upload after the `part0` is
finished.
+ writer.close().await?;
+```
+Currently, when invoking a `Writer` that supports the `multi_write`
functionality, multiple writes are proceed serially, without fully leveraging
the potential for improved throughput through concurrent uploads. We should
enhance support to allow concurrent processing of multiple write operations.
+
+
+# Guide-level explanation
+
+For users who want to concurrent writer, they will call the new API
`concurrent`. And the default behavior remains unchanged, so users using
`op.writer_with()` are not affected. The `concurrent` function will take a
number as input, and this number will represent the maximum concurrent write
task amount the writer can perform.
+
+- If `concurrent` is set to 0 or 1, it functions with default behavior(writes
serially).
+- However, if `concurrent` is set to number larger than 1. It enables
concurrent uploading of up to `concurrent` write tasks and allows users to
initiate additional write tasks without waiting to complete the previous write
operation, as long as the inner task queue still has available slots.
+
+The concurrent write feature operate independently of other features.
+
+```rust
+let mut w = op.writer_with(path).concurrent(8).await;
+w.write(part0).await?;
+w.write(part1).await?; // `write` won't wait for part0.
+w.close().await?; // `close` will make sure all parts are finished.
+```
+
+# Reference-level explanation
+
+The S3 and similar services use `MultipartUploadWriter`, while GCS uses
`RangeWriter`. We can enhance these services by adding concurrent write
features to them. A `concurrent` field of type `usize` will be introduced to
`OpWrite` to allow the user to set the maximum concurrent write task amount.
For other services that don't support `multi_write`, setting the concurrent
parameter will have no effect, maintaining the default behavior.
+
+This feature will be implemented in the `MultipartUploadWriter` and
`RangeWriter`, which will utilize a `ConcurrentFutures<WriteTask>` as a task
queue to store concurrent write tasks.
+
+When the upper layer invokes `poll_write`, the `Writer` pushes write to the
task queue (`ConcurrentFutures<WriteTask>`) if there are available slots, and
then relinquishes control back to the upper layer. This allows for up to
`concurrent` write tasks to uploaded concurrently without waiting. If the task
queue is full, the `Writer` waits for the first task to yield results.
+
+# Drawbacks
+
+- More memory usage
+- More concurrent connections
+
+# Rationale and alternatives
+
+None
+
+# Prior art
+
+None
+
+# Unresolved questions
+
+None
+
+# Future possibilities
+
+- Adding `write_at` for `fs`.
+- Use `ConcurrentFutureUnordered` instead of `ConcurrentFutures.`
diff --git a/core/src/docs/rfcs/mod.rs b/core/src/docs/rfcs/mod.rs
index 4272b128b8..57bc6824d4 100644
--- a/core/src/docs/rfcs/mod.rs
+++ b/core/src/docs/rfcs/mod.rs
@@ -219,3 +219,6 @@ pub mod rfc_3574_concurrent_stat_in_list {}
#[doc = include_str!("3734_buffered_reader.md")]
pub mod rfc_3734_buffered_reader {}
+
+#[doc = include_str!("3898_concurrent_writer.md")]
+pub mod rfc_3898_concurrent_writer {}