chenghuichen opened a new pull request, #7671: URL: https://github.com/apache/paimon/pull/7671
### Purpose Each full-text search query currently opens a fresh `TantivySearcher`, which rebuilds the Rust-side index structures (including loading the `.term` FST dictionary) from scratch. On object storage (S3/OSS), this means a full GET of the index file on every query. In Flink streaming pipelines — the primary JVM consumer of Paimon's global index — the same index shard is queried continuously within a single subtask lifetime, making repeated loading pure waste. This PR introduces a `TantivySearcherPool` that keeps `TantivySearcher` instances alive across queries, borrowing on query start and returning on close rather than destroying and rebuilding. ### Benefit Assessment Benchmark on local disk (500k docs, 17MB index, 500 queries, JIT-warmed): ``` No-pool: avg=2.86 ms (open=1.40 ms / 49%, search=1.27 ms) With pool: avg=0.79 ms (search only) Speedup: 3.62x ``` On object storage the gap widens further: the open phase includes a full GET of the `.term` file (FST dictionary, typically several MB per shard). With the pool, `.term` stays resident in Rust memory across queries, eliminating both the latency and the object storage data transfer cost of repeated loading. For tables under heavy compaction, index files are replaced by new paths; stale pool entries go unused without affecting correctness. ### Tests -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
