mro68 opened a new issue, #7057:
URL: https://github.com/apache/opendal/issues/7057

   **Title:** bug(services/gdrive): `list()` returns size=0 for all files - 
missing size field in API call and metadata mapping
   
   **Labels:** `bug`, `services/gdrive`, `good first issue`
   
   ---
   
   ## Bug Description
   
   The Google Drive backend returns `content_length = 0` for all files when 
using `list()` operations, even though the files are not empty.  This breaks 
any application relying on file size metadata. 
   
   Interestingly, `stat()` **works correctly** and returns the actual file 
size, indicating this is a specific bug in the `list()` implementation.
   
   ## Impact
   
   **Severity:  Critical** - Makes Google Drive backend unusable for: 
   - ✗ Backup tools (rustic, restic) - checksum validation fails
   - ✗ File synchronization - size-based deduplication broken  
   - ✗ Repository integrity checks - reports false errors
   - ✗ Any application using file sizes from listings
   
   ## Reproduction
   
   ```rust
   use opendal:: Operator;
   use opendal:: services:: Gdrive;
   
   #[tokio::main]
   async fn main() -> Result<()> {
       let mut builder = Gdrive::default()
           .root("/test")
           .access_token("<token>");
       
       let op = Operator::new(builder)?.finish();
       
       // Upload a 1MB test file
       let test_data = vec![0u8; 1_000_000];
       op.write("testfile.bin", test_data).await?;
       
       // stat() works correctly ✓
       let stat_meta = op.stat("testfile. bin").await?;
       println!("stat() size: {}", stat_meta.content_length());  
       // Output: stat() size: 1000000 ✓
       
       // list() returns wrong size ✗
       let mut lister = op.lister("/").await?;
       while let Some(entry) = lister.try_next().await? {
           if entry.path() == "testfile.bin" {
               let list_meta = entry.metadata();
               println!("list() size: {}", list_meta.content_length());
               // Output:  list() size: 0 ✗ BUG! 
           }
       }
       
       Ok(())
   }
   ```
   
   **Real-world example from rustic backup tool:**
   
   ```bash
   $ rustic repoinfo
   
   | File type | Count | Total Size |
   |-----------|-------|------------|
   | Pack      |  1572 |        0 B |  ← Should be ~105 GiB! 
   
   $ rustic check
   [ERROR] pack 003a836a:  size computed by index:  76969720, actual size: 0
   [ERROR] pack 008b374d: size computed by index: 77415589, actual size: 0
   ```
   
   The index is **correct** - OpenDAL just returns wrong metadata!
   
   ## Root Cause Analysis
   
   I've identified **two bugs** in the Google Drive listing implementation:
   
   ### Bug 1: Missing `size` field in API request
   
   **File:** `core/src/services/gdrive/core.rs`
   
   The `gdrive_list()` function (implementation not visible in search results, 
but called from `lister.rs:54`) likely does **not** request the `size` field 
from Google Drive API.
   
   **Comparison with working `stat()` implementation:**
   
   ```rust
   // core. rs line 61-65 - stat() implementation (WORKS ✓)
   pub async fn gdrive_stat(&self, path: &str) -> Result<Response<Buffer>> {
       // ... 
       let mut req = Request::get(format!(
           
"https://www.googleapis.com/drive/v3/files/{file_id}?fields=id,name,mimeType,size,modifiedTime";
           //                                                                   
         ^^^^ size is requested!
       ))
   ```
   
   The `gdrive_list()` function probably looks like: 
   ```rust
   // Suspected bug in gdrive_list():
   let url = format!(
       "https://www.googleapis.com/drive/v3/files? 
q='{parent_id}'+in+parents&fields=nextPageToken,files(id,name,mimeType,parents,trashed)"
       //                                                                       
                                   ^^^^^^^ size is MISSING!
   );
   ```
   
   ### Bug 2: Missing metadata mapping in lister
   
   **File:** `core/src/services/gdrive/lister.rs` (lines 77-99)
   
   Even if the API returns the size, the lister doesn't map it to the metadata: 
   
   ```rust
   // lister.rs line 77-95 - Current implementation (BROKEN ✗)
   for mut file in decoded_response.files {
       let file_type = if file.mime_type.as_str() == 
"application/vnd.google-apps.folder" {
           if ! file.name.ends_with('/') {
               file.name += "/";
           }
           EntryMode::DIR
       } else {
           EntryMode::FILE
       };
   
       let root = &self.core.root;
       let path = format!("{}{}", &self.path, file.name);
       let normalized_path = build_rel_path(root, &path);
   
       // BUG: Only file_type is set, no size or modified_time!
       let metadata = Metadata::new(file_type);  // ✗ Missing size!
       
       // Update path cache... 
       self.core.path_cache.insert(&path, &file.id).await;
       
       let e = oio::Entry::new(&normalized_path, metadata);
       ctx.entries.push_back(e);
   }
   ```
   
   **Compare with working `stat()` metadata mapping:**
   
   ```rust
   // backend.rs lines 73-85 - stat() implementation (WORKS ✓)
   let mut meta = 
Metadata::new(file_type).with_content_type(gdrive_file.mime_type);
   
   if let Some(v) = gdrive_file.size {
       meta = meta.with_content_length(v. parse::<u64>().map_err(|e| {
           Error::new(ErrorKind:: Unexpected, "parse content 
length").set_source(e)
       })?);
   }
   
   if let Some(v) = gdrive_file.modified_time {
       meta = meta.with_last_modified(v.parse::<Timestamp>().map_err(|e| {
           Error::new(ErrorKind::Unexpected, "parse last modified 
time").set_source(e)
       })?);
   }
   ```
   
   ## Proposed Fix
   
   ### Fix 1: Update `gdrive_list()` API call
   
   **File:** `core/src/services/gdrive/core.rs`
   
   In the `gdrive_list()` function, update the `fields` parameter to include 
`size` and `modifiedTime`:
   
   ```rust
   // Current (suspected):
   let url = format!(
       "https://www.googleapis.com/drive/v3/files?q='{parent_id}'+in+parents\
        &pageSize={page_size}\
        &pageToken={page_token}\
        &fields=nextPageToken,files(id,name,mimeType,parents,trashed)"
   );
   
   // Should be:
   let url = format!(
       "https://www.googleapis.com/drive/v3/files?q='{parent_id}'+in+parents\
        &pageSize={page_size}\
        &pageToken={page_token}\
        
&fields=nextPageToken,files(id,name,mimeType,size,modifiedTime,parents,trashed)"
       //                                              ^^^^ ^^^^^^^^^^^^ ADD 
THESE
   );
   ```
   
   ### Fix 2: Map metadata in lister
   
   **File:** `core/src/services/gdrive/lister.rs` (around line 77)
   
   Update the metadata creation to match the `stat()` implementation:
   
   ```rust
   for mut file in decoded_response.files {
       let file_type = if file.mime_type.as_str() == 
"application/vnd.google-apps.folder" {
           if !file.name.ends_with('/') {
               file.name += "/";
           }
           EntryMode:: DIR
       } else {
           EntryMode::FILE
       };
   
       let root = &self. core.root;
       let path = format!("{}{}", &self.path, file.name);
       let normalized_path = build_rel_path(root, &path);
   
       // ✅ FIX: Add size and modified_time to metadata (same as stat())
       let mut metadata = Metadata::new(file_type)
           .with_content_type(file.mime_type.clone());
       
       if let Some(size) = file.size {
           metadata = metadata.with_content_length(
               size.parse::<u64>().map_err(|e| {
                   Error::new(ErrorKind::Unexpected, "parse content 
length").set_source(e)
               })?
           );
       }
       
       if let Some(modified_time) = file.modified_time {
           metadata = metadata. with_last_modified(
               modified_time.parse::<Timestamp>().map_err(|e| {
                   Error::new(ErrorKind:: Unexpected, "parse last modified 
time").set_source(e)
               })?
           );
       }
   
       // Update path cache...
       self.core.path_cache.insert(&path, &file.id).await;
       
       let e = oio::Entry::new(&normalized_path, metadata);
       ctx.entries.push_back(e);
   }
   ```
   
   **Note:** This requires adding `size` and `modified_time` fields to the 
`GdriveFile` struct if not already present. 
   
   ## Test Case
   
   ```rust
   #[tokio::test]
   async fn test_list_includes_file_size() -> Result<()> {
       let op = new_gdrive_operator().await?;
       
       // Create test file with known size
       let test_data = vec![42u8; 12345];
       let path = "test_list_size.bin";
       op.write(path, test_data. clone()).await?;
       
       // Verify stat() returns correct size (should already pass)
       let stat_meta = op.stat(path).await?;
       assert_eq!(stat_meta.content_length(), 12345);
       
       // Verify list() returns correct size (currently FAILS)
       let mut lister = op.lister("/").await?;
       let mut found = false;
       while let Some(entry) = lister.try_next().await? {
           if entry.path().ends_with(path) {
               let list_meta = entry.metadata();
               assert_eq!(list_meta.content_length(), 12345); // ← Currently 
returns 0
               found = true;
               break;
           }
       }
       assert!(found, "File not found in listing");
       
       // Cleanup
       op.delete(path).await?;
       Ok(())
   }
   ```
   
   ## Environment
   
   - **OpenDAL version:** Latest (tested via rustic 0.10.x)
   - **Service:** Google Drive API v3
   - **Affected operations:** `list()`, `lister()`
   - **Working operations:** `stat()`
   
   ## Related Issues
   
   - #6684 - Google Drive behavior test failures (might be related to missing 
metadata)
   - #6323 - Discussion about size=0 ambiguity (this bug contributes to that 
confusion)
   
   ## Additional Context
   
   **Why this wasn't caught in tests:**
   
   The behavior tests might not be checking `content_length()` from list 
results, only from `stat()` results. 
   
   **Performance note:**
   
   There's also a secondary performance issue with Google Drive listing 
(excessive API calls), but that's a separate bug.  This issue focuses solely on 
the missing size metadata.
   
   ## Are you willing to submit a PR? 
   
   - [ ] Yes, I can submit a PR with the fix
   - [X] I've identified the exact location and solution, but would appreciate 
if a maintainer could implement it
   
   ## References
   
   - [Google Drive API Files. 
list](https://developers.google.com/drive/api/v3/reference/files/list)
   - [Google Drive API Fields 
parameter](https://developers.google.com/drive/api/guides/fields-parameter)
   - Working implementation: `core/src/services/gdrive/backend.rs` lines 73-85 
(stat)
   - Broken implementation: `core/src/services/gdrive/lister.rs` lines 77-95 
(list)
   
   ---
   
   **Impact Summary:**
   - ⚠️ Critical bug affecting production use
   - ✅ Easy fix (2 small changes)
   - ✅ No API changes needed (Google Drive already provides size)
   - ✅ Exact code locations identified
   - ✅ Solution pattern already exists in codebase (stat implementation)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to