JingsongLi opened a new pull request, #231:
URL: https://github.com/apache/paimon-rust/pull/231
<!--
Thank you very much for contributing to Paimon Rust - we are happy that you
want to help us improve it. To help the community review your contribution in
the best possible way, please go through the checklist below, which will get
the contribution into a shape in which it can be best reviewed.
## Contribution Checklist
- Make sure that the pull request corresponds to a [GitHub
issue](https://github.com/apache/paimon-rust/issues). Exceptions are made for
typos in documentation or comments, which need no issue.
- Fill out the template below to describe the changes contributed by the
pull request. That will give reviewers the context they need to do the review.
- Make sure that the change passes the automated tests, i.e., `cargo test`
passes.
- Each pull request should address only one issue, not mix up code from
multiple issues.
**(The sections below can be removed for hotfixes or typos)**
-->
### Purpose
<!-- Linking this pull request to the issue -->
Subtask of #227
Introduce a complete Tantivy-based full-text search pipeline for global
indexes, with on-demand I/O throughout:
- ArchiveDirectory: reads only the archive header eagerly; file data is
loaded via async FileRead when Tantivy requests it (sync-to-async bridge using
std::thread::scope).
- TantivyFullTextWriter: streams the packed archive directly to an
OutputFile instead of buffering in memory.
- TantivyFullTextReader: opens from InputFile/FileRead, never loads the full
archive into memory.
- FullTextSearchBuilder: self-contained builder on Table that reads the
index manifest, evaluates searches against multiple Tantivy indexes in parallel
(try_join_all), and returns ScoredGlobalIndexResult.
- ScoredGlobalIndexResult + bitmap_to_ranges moved to table/source.rs
(alongside RowRange) so vector search can reuse them later.
- TableScan.with_row_ranges(): generic row-range filtering, decoupled from
full-text specifics.
- DataFusion full_text_search UDTF integration with test data.
<!-- What is the purpose of the change -->
### Brief change log
<!-- Please describe the changes made in this pull request and explain how
they address the issue -->
### Tests
<!-- List unit tests or integration cases to verify this change -->
### API and Format
<!-- Does this change affect API or storage format -->
### Documentation
<!-- Does this change introduce a new feature or require documentation
updates -->
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]