Xuanwo commented on code in PR #5871: URL: https://github.com/apache/opendal/pull/5871#discussion_r2011264468
########## core/src/docs/rfcs/5871_read_returns_metadata.md: ########## @@ -0,0 +1,132 @@ +- Proposal Name: `read_returns_metadata` +- Start Date: 2025-03-24 +- RFC PR: [apache/opendal#5871](https://github.com/apache/opendal/pull/5871) +- Tracking Issue: [apache/opendal#5872](https://github.com/apache/opendal/issues/5872) + +# Summary + +Enhance read operations by returning metadata along with data in read operations. + +# Motivation + +Currently, read operations (`read`, `read_with`, `reader`, `reader_with`) only return the data content. Users who need metadata +during reads (like `Content-Type`, `ETag`, `version_id`, etc.) must make an additional `stat()` call. This is inefficient and +can lead to race conditions if the file is modified between the read and stat operations. + +Many storage services (like S3, GCS, Azure Blob) return metadata in their read responses. For example, S3's GetObject API returns +important metadata like `ContentType`, `ETag`, `VersionId`, `LastModified`, etc. We should expose this information to users +directly during read operations. + +# Guide-level explanation + +The read operations will be enhanced to return both data and metadata: + +```rust +// Before +let data = op.read("path/to/file").await?; +let meta = op.stat("path/to/file").await?; +if let Some(content_type) = meta.content_type() { + println!("Content-Type: {}", content_type); +} + +// After +let (data, meta) = op.read("path/to/file").await?; +if let Some(content_type) = meta.content_type() { + println!("Content-Type: {}", content_type); +} +``` + +For reader operations: + +```rust +// Before +let data = op.reader("path/to/file").await?.read(..).await?; +let meta = op.stat("path/to/file").await?; +if let Some(etag) = meta.etag() { + println!("ETag: {}", etag); +} + +// After +let reader = op.reader("path/to/file").await?; +let (data, meta) = reader.read(..).await?; Review Comment: This design can breaks every call to `read`, which I don't think is a good idea for us. More importantly, not all users are interested in the metadata returned by `read`. Requiring all users to accommodate these breaking changes doesn't make sense to me. As an alternative, how about add API in this way: ```rust impl Reader { fn metadata(&self) -> &Metadata; } ``` Users who want to know about `etag` / `content_length` can use in this way: ```rust let reader = op.reader(path).await?; let etag = reader.metadata().etag(); ``` This change will also be compatible. No API breaking. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
