Xuanwo commented on code in PR #5871:
URL: https://github.com/apache/opendal/pull/5871#discussion_r2013393370


##########
core/src/docs/rfcs/5871_read_returns_metadata.md:
##########
@@ -0,0 +1,117 @@
+- Proposal Name: `read_returns_metadata`
+- Start Date: 2025-03-24
+- RFC PR: [apache/opendal#5871](https://github.com/apache/opendal/pull/5871)
+- Tracking Issue: 
[apache/opendal#5872](https://github.com/apache/opendal/issues/5872)
+
+# Summary
+
+Enhance read operations by returning metadata along with data in read 
operations.
+
+# Motivation
+
+Currently, read operations (`read`, `read_with`, `reader`, `reader_with`) only 
return the data content. Users who need metadata 
+during reads (like `Content-Type`, `ETag`, `version_id`, etc.) must make an 
additional `stat()` call. This is inefficient and 
+can lead to race conditions if the file is modified between the read and stat 
operations.
+
+Many storage services (like S3, GCS, Azure Blob) return metadata in their read 
responses. For example, S3's GetObject API returns 
+important metadata like `ContentType`, `ETag`, `VersionId`, `LastModified`, 
etc. We should expose this information to users 
+directly during read operations.
+
+# Guide-level explanation
+
+For `reader` API, we will introduce a new method `metadata()` that returns 
metadata:
+
+```rust
+// Before
+let data = op.reader("path/to/file").await?.read(..).await?;
+let meta = op.stat("path/to/file").await?;
+if let Some(etag) = meta.etag() {
+    println!("ETag: {}", etag);
+}
+
+// After
+let reader = op.reader("path/to/file").await?;
+let meta = reader.metadata();
+if let Some(etag) = meta.etag() {
+    println!("ETag: {}", etag);
+}
+let data = reader.read(..).await?;
+```
+The new API will be optional, and users can still use the existing `reader` 
methods without any changes.

Review Comment:
   I feel that "optional" is not the right word here. This API is not optional; 
it's additive.



##########
core/src/docs/rfcs/5871_read_returns_metadata.md:
##########
@@ -0,0 +1,117 @@
+- Proposal Name: `read_returns_metadata`
+- Start Date: 2025-03-24
+- RFC PR: [apache/opendal#5871](https://github.com/apache/opendal/pull/5871)
+- Tracking Issue: 
[apache/opendal#5872](https://github.com/apache/opendal/issues/5872)
+
+# Summary
+
+Enhance read operations by returning metadata along with data in read 
operations.
+
+# Motivation
+
+Currently, read operations (`read`, `read_with`, `reader`, `reader_with`) only 
return the data content. Users who need metadata 
+during reads (like `Content-Type`, `ETag`, `version_id`, etc.) must make an 
additional `stat()` call. This is inefficient and 
+can lead to race conditions if the file is modified between the read and stat 
operations.
+
+Many storage services (like S3, GCS, Azure Blob) return metadata in their read 
responses. For example, S3's GetObject API returns 
+important metadata like `ContentType`, `ETag`, `VersionId`, `LastModified`, 
etc. We should expose this information to users 
+directly during read operations.
+
+# Guide-level explanation
+
+For `reader` API, we will introduce a new method `metadata()` that returns 
metadata:
+
+```rust
+// Before
+let data = op.reader("path/to/file").await?.read(..).await?;
+let meta = op.stat("path/to/file").await?;
+if let Some(etag) = meta.etag() {
+    println!("ETag: {}", etag);
+}
+
+// After
+let reader = op.reader("path/to/file").await?;
+let meta = reader.metadata();
+if let Some(etag) = meta.etag() {
+    println!("ETag: {}", etag);
+}
+let data = reader.read(..).await?;
+```
+The new API will be optional, and users can still use the existing `reader` 
methods without any changes.
+
+For backward compatibility and to minimize migration costs, We won't change 
the existing `read` API. Anyone who wants 
+to obtain metadata during reading can use the new reader operations instead.
+
+# Reference-level explanation
+
+## Changes to `Reader` API
+
+The `impl Reader` will be modified to include a new function `metadata()` that 
returns metadata.
+
+```rust
+impl Reader {
+    // Existing fields...
+    
+    fn metadata(&self) -> &Metadata {}
+}
+```
+
+## Changes to trait `oio::Read`
+
+The `Read` trait will be modified to include a new function `metadata()` that 
returns metadata.
+
+```rust
+pub trait Read {
+    // Existing functions...
+    
+    fn metadata(&self) -> Metadata;

Review Comment:
   Such API also changes the fact that all APIs for `oio::Read` takes `&mut 
self`. This can lead to `oio::Read` is not `Sync` anymore.



##########
core/src/docs/rfcs/5871_read_returns_metadata.md:
##########
@@ -0,0 +1,117 @@
+- Proposal Name: `read_returns_metadata`
+- Start Date: 2025-03-24
+- RFC PR: [apache/opendal#5871](https://github.com/apache/opendal/pull/5871)
+- Tracking Issue: 
[apache/opendal#5872](https://github.com/apache/opendal/issues/5872)
+
+# Summary
+
+Enhance read operations by returning metadata along with data in read 
operations.
+
+# Motivation
+
+Currently, read operations (`read`, `read_with`, `reader`, `reader_with`) only 
return the data content. Users who need metadata 
+during reads (like `Content-Type`, `ETag`, `version_id`, etc.) must make an 
additional `stat()` call. This is inefficient and 
+can lead to race conditions if the file is modified between the read and stat 
operations.
+
+Many storage services (like S3, GCS, Azure Blob) return metadata in their read 
responses. For example, S3's GetObject API returns 
+important metadata like `ContentType`, `ETag`, `VersionId`, `LastModified`, 
etc. We should expose this information to users 
+directly during read operations.
+
+# Guide-level explanation
+
+For `reader` API, we will introduce a new method `metadata()` that returns 
metadata:
+
+```rust
+// Before
+let data = op.reader("path/to/file").await?.read(..).await?;
+let meta = op.stat("path/to/file").await?;
+if let Some(etag) = meta.etag() {
+    println!("ETag: {}", etag);
+}
+
+// After
+let reader = op.reader("path/to/file").await?;
+let meta = reader.metadata();
+if let Some(etag) = meta.etag() {
+    println!("ETag: {}", etag);
+}
+let data = reader.read(..).await?;
+```
+The new API will be optional, and users can still use the existing `reader` 
methods without any changes.
+
+For backward compatibility and to minimize migration costs, We won't change 
the existing `read` API. Anyone who wants 
+to obtain metadata during reading can use the new reader operations instead.
+
+# Reference-level explanation
+
+## Changes to `Reader` API
+
+The `impl Reader` will be modified to include a new function `metadata()` that 
returns metadata.
+
+```rust
+impl Reader {
+    // Existing fields...
+    
+    fn metadata(&self) -> &Metadata {}
+}
+```
+
+## Changes to trait `oio::Read`
+
+The `Read` trait will be modified to include a new function `metadata()` that 
returns metadata.
+
+```rust
+pub trait Read {
+    // Existing functions...
+    
+    fn metadata(&self) -> Metadata;
+}
+```
+
+## Changes to struct `http_util::HttpBody`
+
+The `HttpBody` struct will be modified to include a new field for metadata.
+
+
+
+## Implementation Details
+
+For services that return metadata in their read responses:
+- The metadata will be captured from the service response.
+- All available fields (content_type, etag, version_id, last_modified, etc.) 
will be populated
+
+For services that don't return metadata in read responses:
+- for `fs`: we can use `stat` to retrieve the metadata before returning. Since 
the metadata is cached by the kernel, this should be efficient
+- for other services: A default metadata object will be returned

Review Comment:
   We can call a `stat` instead.



##########
core/src/docs/rfcs/5871_read_returns_metadata.md:
##########
@@ -0,0 +1,117 @@
+- Proposal Name: `read_returns_metadata`
+- Start Date: 2025-03-24
+- RFC PR: [apache/opendal#5871](https://github.com/apache/opendal/pull/5871)
+- Tracking Issue: 
[apache/opendal#5872](https://github.com/apache/opendal/issues/5872)
+
+# Summary
+
+Enhance read operations by returning metadata along with data in read 
operations.
+
+# Motivation
+
+Currently, read operations (`read`, `read_with`, `reader`, `reader_with`) only 
return the data content. Users who need metadata 
+during reads (like `Content-Type`, `ETag`, `version_id`, etc.) must make an 
additional `stat()` call. This is inefficient and 
+can lead to race conditions if the file is modified between the read and stat 
operations.
+
+Many storage services (like S3, GCS, Azure Blob) return metadata in their read 
responses. For example, S3's GetObject API returns 
+important metadata like `ContentType`, `ETag`, `VersionId`, `LastModified`, 
etc. We should expose this information to users 
+directly during read operations.
+
+# Guide-level explanation
+
+For `reader` API, we will introduce a new method `metadata()` that returns 
metadata:
+
+```rust
+// Before
+let data = op.reader("path/to/file").await?.read(..).await?;
+let meta = op.stat("path/to/file").await?;
+if let Some(etag) = meta.etag() {
+    println!("ETag: {}", etag);
+}
+
+// After
+let reader = op.reader("path/to/file").await?;
+let meta = reader.metadata();
+if let Some(etag) = meta.etag() {
+    println!("ETag: {}", etag);
+}
+let data = reader.read(..).await?;
+```
+The new API will be optional, and users can still use the existing `reader` 
methods without any changes.
+
+For backward compatibility and to minimize migration costs, We won't change 
the existing `read` API. Anyone who wants 
+to obtain metadata during reading can use the new reader operations instead.
+
+# Reference-level explanation
+
+## Changes to `Reader` API
+
+The `impl Reader` will be modified to include a new function `metadata()` that 
returns metadata.
+
+```rust
+impl Reader {
+    // Existing fields...
+    
+    fn metadata(&self) -> &Metadata {}
+}
+```
+
+## Changes to trait `oio::Read`
+
+The `Read` trait will be modified to include a new function `metadata()` that 
returns metadata.
+
+```rust
+pub trait Read {
+    // Existing functions...
+    
+    fn metadata(&self) -> Metadata;
+}
+```
+
+## Changes to struct `http_util::HttpBody`
+
+The `HttpBody` struct will be modified to include a new field for metadata.
+
+
+
+## Implementation Details
+
+For services that return metadata in their read responses:
+- The metadata will be captured from the service response.
+- All available fields (content_type, etag, version_id, last_modified, etc.) 
will be populated
+
+For services that don't return metadata in read responses:
+- for `fs`: we can use `stat` to retrieve the metadata before returning. Since 
the metadata is cached by the kernel, this should be efficient
+- for other services: A default metadata object will be returned
+
+Special considerations:
+- We should always return total object size in the metadata, even if it's not 
part of the read response
+- For range reads, the metadata should reflect the full object's properties 
(like total size) rather than the range
+- For versioned objects, the metadata should include version information if 
available
+
+# Drawbacks
+
+- Additional memory overhead for storing metadata during reads
+- Potential complexity in handling metadata for range reads
+
+# Rationale and alternatives
+
+- Maintains full backward compatibility with existing read operations
+- Improves performance by avoiding additional stat calls
+- Aligns with common storage service APIs (S3, GCS, Azure)
+
+# Prior art
+
+Similar patterns exist in other storage SDKs:
+
+- `object_store` crate returns metadata in `GetResult` after calling `get_opts`
+- AWS S3 SDK returns comprehensive metadata in `GetObjectOutput`
+- Azure Blob SDK returns properties and metadata in `DownloadResponse`
+
+# Unresolved questions
+
+None
+
+# Future possibilities
+
+None

Review Comment:
   There some size related tricks in `ReadContext::parse_into_range`: 
   
   
https://github.com/apache/opendal/blob/8c72db973a97cf94b3acbb9744933b8e0d635815/core/src/types/context/read.rs#L72-L103
   
   After this change, we can remove then entirely.



##########
core/src/docs/rfcs/5871_read_returns_metadata.md:
##########
@@ -0,0 +1,117 @@
+- Proposal Name: `read_returns_metadata`
+- Start Date: 2025-03-24
+- RFC PR: [apache/opendal#5871](https://github.com/apache/opendal/pull/5871)
+- Tracking Issue: 
[apache/opendal#5872](https://github.com/apache/opendal/issues/5872)
+
+# Summary
+
+Enhance read operations by returning metadata along with data in read 
operations.
+
+# Motivation
+
+Currently, read operations (`read`, `read_with`, `reader`, `reader_with`) only 
return the data content. Users who need metadata 
+during reads (like `Content-Type`, `ETag`, `version_id`, etc.) must make an 
additional `stat()` call. This is inefficient and 
+can lead to race conditions if the file is modified between the read and stat 
operations.
+
+Many storage services (like S3, GCS, Azure Blob) return metadata in their read 
responses. For example, S3's GetObject API returns 
+important metadata like `ContentType`, `ETag`, `VersionId`, `LastModified`, 
etc. We should expose this information to users 
+directly during read operations.
+
+# Guide-level explanation
+
+For `reader` API, we will introduce a new method `metadata()` that returns 
metadata:
+
+```rust
+// Before
+let data = op.reader("path/to/file").await?.read(..).await?;
+let meta = op.stat("path/to/file").await?;
+if let Some(etag) = meta.etag() {
+    println!("ETag: {}", etag);
+}
+
+// After
+let reader = op.reader("path/to/file").await?;
+let meta = reader.metadata();
+if let Some(etag) = meta.etag() {
+    println!("ETag: {}", etag);
+}
+let data = reader.read(..).await?;
+```
+The new API will be optional, and users can still use the existing `reader` 
methods without any changes.
+
+For backward compatibility and to minimize migration costs, We won't change 
the existing `read` API. Anyone who wants 
+to obtain metadata during reading can use the new reader operations instead.
+
+# Reference-level explanation
+
+## Changes to `Reader` API
+
+The `impl Reader` will be modified to include a new function `metadata()` that 
returns metadata.
+
+```rust
+impl Reader {
+    // Existing fields...
+    
+    fn metadata(&self) -> &Metadata {}
+}
+```
+
+## Changes to trait `oio::Read`
+
+The `Read` trait will be modified to include a new function `metadata()` that 
returns metadata.
+
+```rust
+pub trait Read {
+    // Existing functions...
+    
+    fn metadata(&self) -> Metadata;
+}
+```
+
+## Changes to struct `http_util::HttpBody`
+
+The `HttpBody` struct will be modified to include a new field for metadata.
+
+
+
+## Implementation Details
+
+For services that return metadata in their read responses:
+- The metadata will be captured from the service response.
+- All available fields (content_type, etag, version_id, last_modified, etc.) 
will be populated
+
+For services that don't return metadata in read responses:
+- for `fs`: we can use `stat` to retrieve the metadata before returning. Since 
the metadata is cached by the kernel, this should be efficient
+- for other services: A default metadata object will be returned
+
+Special considerations:
+- We should always return total object size in the metadata, even if it's not 
part of the read response
+- For range reads, the metadata should reflect the full object's properties 
(like total size) rather than the range

Review Comment:
   Agreed.



##########
core/src/docs/rfcs/5871_read_returns_metadata.md:
##########
@@ -0,0 +1,117 @@
+- Proposal Name: `read_returns_metadata`
+- Start Date: 2025-03-24
+- RFC PR: [apache/opendal#5871](https://github.com/apache/opendal/pull/5871)
+- Tracking Issue: 
[apache/opendal#5872](https://github.com/apache/opendal/issues/5872)
+
+# Summary
+
+Enhance read operations by returning metadata along with data in read 
operations.
+
+# Motivation
+
+Currently, read operations (`read`, `read_with`, `reader`, `reader_with`) only 
return the data content. Users who need metadata 
+during reads (like `Content-Type`, `ETag`, `version_id`, etc.) must make an 
additional `stat()` call. This is inefficient and 
+can lead to race conditions if the file is modified between the read and stat 
operations.
+
+Many storage services (like S3, GCS, Azure Blob) return metadata in their read 
responses. For example, S3's GetObject API returns 
+important metadata like `ContentType`, `ETag`, `VersionId`, `LastModified`, 
etc. We should expose this information to users 
+directly during read operations.
+
+# Guide-level explanation
+
+For `reader` API, we will introduce a new method `metadata()` that returns 
metadata:
+
+```rust
+// Before
+let data = op.reader("path/to/file").await?.read(..).await?;
+let meta = op.stat("path/to/file").await?;
+if let Some(etag) = meta.etag() {
+    println!("ETag: {}", etag);
+}
+
+// After
+let reader = op.reader("path/to/file").await?;
+let meta = reader.metadata();
+if let Some(etag) = meta.etag() {
+    println!("ETag: {}", etag);
+}
+let data = reader.read(..).await?;
+```
+The new API will be optional, and users can still use the existing `reader` 
methods without any changes.
+
+For backward compatibility and to minimize migration costs, We won't change 
the existing `read` API. Anyone who wants 
+to obtain metadata during reading can use the new reader operations instead.
+
+# Reference-level explanation
+
+## Changes to `Reader` API
+
+The `impl Reader` will be modified to include a new function `metadata()` that 
returns metadata.
+
+```rust
+impl Reader {
+    // Existing fields...
+    
+    fn metadata(&self) -> &Metadata {}
+}
+```
+
+## Changes to trait `oio::Read`

Review Comment:
   We don't need this. The returning metadata can be carried in `RpRead` which 
is created at the same time along with `oio::Read`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to