xushiyan opened a new pull request, #596:
URL: https://github.com/apache/hudi-rs/pull/596
## Description
Introduce file format abstraction and Lance reading support for hudi-rs
(Phase 1 of vector search implementation plan).
- Define `BaseFileReader` trait to decouple file format reading from
Parquet-specific code, with `ParquetBaseFileReader` wrapping existing Storage
methods
- Integrate `BaseFileReader` into `FileGroupReader` and `Storage` with
config-based and extension-fallback format dispatch
- Add `lance` and `lance-io` as optional dependencies behind a `lance`
feature flag (requires Rust 1.91+)
- Implement `LanceBaseFileReader` for reading Lance datasets via
`BaseFileReader` trait
- Add custom `HudiScanExec` (`ExecutionPlan`) for DataFusion to handle Lance
reads and MOR snapshot queries that cannot use `ParquetSource`
- Preserve existing `ParquetSource` fast path for Parquet COW and MOR
read-optimized queries
- Add v9 Lance nonpartitioned COW test table fixture and corresponding
integration tests
## How are the changes test-covered
- [ ] N/A
- [x] Automated tests (unit and/or integration tests)
- [ ] Manual tests
- [ ] Details are described below
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]