400Ping opened a new pull request, #1174:
URL: https://github.com/apache/mahout/pull/1174

   ### Related Issues
   
   Closes #1162 
   
   ### Changes
   
   - [ ] Bug fix
   - [x] New feature
   - [ ] Refactoring
   - [x] Documentation
   - [x] Test
   - [ ] CI/CD pipeline
   - [ ] Other
   
   ## Summary
   
   This PR extends QDP remote object storage support from S3-only to S3 + GCS, 
and clarifies remote URL semantics to avoid ambiguous behavior.
   
   ### What this PR changes
   
   1. Add GCS remote URL support in `qdp-core`:
      - Support `gs://bucket/key` in remote path detection.
      - Build GCS object store client via `GoogleCloudStorageBuilder::from_env`.
      - Enable `object_store` `gcp` feature in `qdp-core/Cargo.toml`.
   
   2. Clarify remote URL parsing behavior:
      - Reject remote URLs containing query/fragment (`?`, `#`) in core parsing.
      - Add matching validation in Python loader builder for fail-fast errors.
      - Keep accepted form explicit: `scheme://bucket/key`.
   
   3. Update tests:
      - Add/extend Rust tests for `gs://` parsing and query/fragment rejection.
      - Update Python loader tests:
        - accept `s3://` and `gs://` remote paths
        - reject query/fragment remote URLs
        - keep streaming `.parquet` validation
   
   4. Update docs:
      - Add `s3://` / `gs://` remote usage examples.
      - Document current limitation: query/fragment is not supported.
      - Sync docs in both `docs/qdp/*` and 
`website/versioned_docs/version-0.5/qdp/*`.
   
   ## Checklist
   
   - [x] Added or updated unit tests for all changes
   - [x] Added or updated documentation for all changes
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to