Xuanwo commented on code in PR #5871: URL: https://github.com/apache/opendal/pull/5871#discussion_r2014496668
########## core/src/docs/rfcs/5871_read_returns_metadata.md: ########## @@ -0,0 +1,117 @@ +- Proposal Name: `read_returns_metadata` +- Start Date: 2025-03-24 +- RFC PR: [apache/opendal#5871](https://github.com/apache/opendal/pull/5871) +- Tracking Issue: [apache/opendal#5872](https://github.com/apache/opendal/issues/5872) + +# Summary + +Enhance read operations by returning metadata along with data in read operations. + +# Motivation + +Currently, read operations (`read`, `read_with`, `reader`, `reader_with`) only return the data content. Users who need metadata +during reads (like `Content-Type`, `ETag`, `version_id`, etc.) must make an additional `stat()` call. This is inefficient and +can lead to race conditions if the file is modified between the read and stat operations. + +Many storage services (like S3, GCS, Azure Blob) return metadata in their read responses. For example, S3's GetObject API returns +important metadata like `ContentType`, `ETag`, `VersionId`, `LastModified`, etc. We should expose this information to users +directly during read operations. + +# Guide-level explanation + +For `reader` API, we will introduce a new method `metadata()` that returns metadata: + +```rust +// Before +let data = op.reader("path/to/file").await?.read(..).await?; +let meta = op.stat("path/to/file").await?; +if let Some(etag) = meta.etag() { + println!("ETag: {}", etag); +} + +// After +let reader = op.reader("path/to/file").await?; +let meta = reader.metadata(); +if let Some(etag) = meta.etag() { + println!("ETag: {}", etag); +} +let data = reader.read(..).await?; +``` +The new API will be optional, and users can still use the existing `reader` methods without any changes. + +For backward compatibility and to minimize migration costs, We won't change the existing `read` API. Anyone who wants +to obtain metadata during reading can use the new reader operations instead. + +# Reference-level explanation + +## Changes to `Reader` API + +The `impl Reader` will be modified to include a new function `metadata()` that returns metadata. + +```rust +impl Reader { + // Existing fields... + + fn metadata(&self) -> &Metadata {} +} +``` + +## Changes to trait `oio::Read` + +The `Read` trait will be modified to include a new function `metadata()` that returns metadata. + +```rust +pub trait Read { + // Existing functions... + + fn metadata(&self) -> Metadata; Review Comment: It's an interesting trick involving Rust's `Send` and `Sync`. If a trait only provides `&mut self` APIs and there's no way to access the struct itself, we can safely declare it as `Sync` when it's `Send` by self because it cannot be accessed from multiple threads simultaneously. This can eliminate some unnecessary `Mutex<T>` instances when implementing `oio::Read`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
