This is an automated email from the ASF dual-hosted git repository.

meteorgan pushed a commit to branch rfc-read_returns_metadata
in repository https://gitbox.apache.org/repos/asf/opendal.git

commit db77c0e4dc19136714193077cd202db59b4060d1
Author: meteorgan <[email protected]>
AuthorDate: Thu Mar 20 22:39:27 2025 +0800

    RFC: Read Returns Metadata
---
 core/src/docs/rfcs/0000_read_returns_metadata.md | 132 +++++++++++++++++++++++
 1 file changed, 132 insertions(+)

diff --git a/core/src/docs/rfcs/0000_read_returns_metadata.md 
b/core/src/docs/rfcs/0000_read_returns_metadata.md
new file mode 100644
index 000000000..e2220132e
--- /dev/null
+++ b/core/src/docs/rfcs/0000_read_returns_metadata.md
@@ -0,0 +1,132 @@
+- Proposal Name: `read_returns_metadata`
+- Start Date: 2025-03-24
+- RFC PR: [apache/opendal#0000](https://github.com/apache/opendal/pull/0000)
+- Tracking Issue: 
[apache/opendal#0000](https://github.com/apache/opendal/issues/0000)
+
+# Summary
+
+Enhance read operations by returning metadata along with data in read 
operations.
+
+# Motivation
+
+Currently, read operations (`read`, `read_with`, `reader`, `reader_with`) only 
return the data content. Users who need metadata 
+during reads (like `Content-Type`, `ETag`, `version_id`, etc.) must make an 
additional `stat()` call. This is inefficient and 
+can lead to race conditions if the file is modified between the read and stat 
operations.
+
+Many storage services (like S3, GCS, Azure Blob) return metadata in their read 
responses. For example, S3's GetObject API returns 
+important metadata like `ContentType`, `ETag`, `VersionId`, `LastModified`, 
etc. We should expose this information to users 
+directly during read operations.
+
+# Guide-level explanation
+
+The read operations will be enhanced to return both data and metadata:
+
+```rust
+// Before
+let data = op.read("path/to/file").await?;
+let meta = op.stat("path/to/file").await?;
+if let Some(content_type) = meta.content_type() {
+    println!("Content-Type: {}", content_type);
+}
+
+// After
+let (data, meta) = op.read("path/to/file").await?;
+if let Some(content_type) = meta.content_type() {
+    println!("Content-Type: {}", content_type);
+}
+```
+
+For reader operations:
+
+```rust
+// Before
+let data = op.reader("path/to/file").await?.read(..).await?;
+let meta = op.stat("path/to/file").await?;
+if let Some(etag) = meta.etag() {
+    println!("ETag: {}", etag);
+}
+
+// After
+let reader = op.reader("path/to/file").await?;
+let (data, meta) = reader.read(..).await?;
+if let Some(etag) = meta.etag() {
+    println!("ETag: {}", etag);
+}
+```
+
+The behavior remains backward compatible if users don't need the metadata - 
they can simply ignore the metadata part of the return tuple.
+
+# Reference-level explanation
+
+## Changes to `Operator` API
+
+The following functions will be modified to return `Result<(Buffer, 
Metadata)>` instead of `Result<Buffer>`:
+
+- `read()`
+- `read_with()`
+
+## Changes to `Reader` API
+
+- `read()` will be modified to return `Result<(Buffer, Metadata)>` instead of 
`Result<Buffer>`.
+- `fetch()` will be modified to return `Result<(Vec<Buffer>, Metadata)>` 
instead of `Result<Buffer>`.
+
+## Changes to trait `oio::Read`
+
+The `Read` trait will be modified to include a new function `metadata()` that 
returns metadata.
+
+```rust
+pub trait Read {
+    // Existing functions...
+    
+    fn metadata(&self) -> Metadata;
+}
+```
+
+## Changes to struct `http_util::HttpBody`
+
+The `HttpBody` struct will be modified to include a new field for metadata.
+
+
+
+## Implementation Details
+
+For services that return metadata in their read responses:
+- The metadata will be captured from the service response.
+- All available fields (content_type, etag, version_id, last_modified, etc.) 
will be populated
+
+For services that don't return metadata in read responses:
+- for `fs`: we can use `stat` to retrieve the metadata before returning. Since 
the metadata is cached by the kernel, this should be efficient
+- for other services: A default metadata object will be returned
+
+Special considerations:
+- We should always return total object size in the metadata, even if it's not 
part of the read response
+- For range reads, the metadata should reflect the full object's properties 
(like total size) rather than the range
+- For versioned objects, the metadata should include version information if 
available
+
+# Drawbacks
+
+- Minor breaking change for users who explicitly type the return value of read 
operations
+- Additional memory overhead for storing metadata during reads
+- Potential complexity in handling metadata for range reads
+
+# Rationale and alternatives
+
+- Provides a clean, consistent API that matches `write_returns_metadata`
+- Improves performance by avoiding additional stat calls
+- Aligns with common storage service APIs (S3, GCS, Azure)
+
+# Prior art
+
+Similar patterns exist in other storage SDKs:
+
+- `object_store` crate returns metadata in `GetResult` after calling `get_opts`
+- AWS S3 SDK returns comprehensive metadata in `GetObjectOutput`
+- Azure Blob SDK returns properties and metadata in `DownloadResponse`
+
+# Unresolved questions
+
+None
+
+# Future possibilities
+
+None

Reply via email to