mro68 opened a new pull request, #7059:
URL: https://github.com/apache/opendal/pull/7059

   ## Summary
   
   This PR implements a custom `GdriveFlatLister` that uses batch OR queries to 
list multiple directories in a single API call, significantly improving 
recursive listing performance.
   
   **Depends on:** #7058 (needs size/modifiedTime metadata fix first)
   
   ## Motivation
   
   When using OpenDAL's gdrive service for recursive listing (e.g., with backup 
tools like rustic), the generic `FlatLister` makes one API call per directory. 
For repositories with hundreds of subdirectories, this results in hundreds of 
sequential API calls, making it ~50x slower than rclone.
   
   ## Solution
   
   Inspired by rclone's approach, this PR implements batch queries using Google 
Drive's OR syntax:
   ```
   ('id1' in parents or 'id2' in parents or 'id3' in parents ...)
   ```
   
   ### Key Changes
   
   1. **New `gdrive_list_batch()` method** in `core.rs` - Builds OR queries for 
multiple parent IDs
   2. **New `GdriveFlatLister`** in `flat_lister.rs` - Custom recursive lister 
with:
      - Batch processing of up to 50 parent IDs per query
      - Page size of 1000 (Google Drive API maximum)
      - Efficient BFS traversal collecting directories as they're discovered
   3. **Enable `list_with_recursive: true`** capability
   4. **Add `parents` field** to `GdriveFile` struct for batch parent resolution
   
   ### Performance Results
   
   | Metric | Before | After | Improvement |
   |--------|--------|-------|-------------|
   | Time (2000+ files) | ~55s | ~7.5s | **7x faster** |
   | API calls | ~260 | ~12 | **~20x fewer** |
   
   Tested with rustic backup tool against real Google Drive repositories.
   
   ## Technical Details
   
   The `GdriveFlatLister` uses a BFS approach:
   1. Start with the root directory ID
   2. Query up to 50 directories at once using OR query
   3. Process results: yield files, collect new directories
   4. Repeat until all directories are processed
   
   This is similar to how rclone implements `ListR` for Google Drive.
   
   ## Checklist
   
   - [x] I have read the 
[CONTRIBUTING](https://github.com/apache/opendal/blob/main/CONTRIBUTING.md) 
documentation
   - [x] I have added tests that prove my fix is effective or that my feature 
works (behavior tests pass)
   - [x] This PR is based on #7058 which must be merged first


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to