[PR] [tantivy] Reuse TantivySearcher across queries via searcher pool [paimon]

via GitHub Sun, 19 Apr 2026 06:17:11 -0700


chenghuichen opened a new pull request, #7671:
URL: https://github.com/apache/paimon/pull/7671


   ### Purpose
   Each full-text search query currently opens a fresh `TantivySearcher`, which 
rebuilds the Rust-side index structures (including loading the `.term` FST 
dictionary) from scratch. On object storage (S3/OSS), this means a full GET of 
the index file on every query. In Flink streaming pipelines — the primary JVM 
consumer of Paimon's global index — the same index shard is queried 
continuously within a single subtask lifetime, making repeated loading pure 
waste.
   
   This PR introduces a `TantivySearcherPool` that keeps `TantivySearcher` 
instances alive across queries, borrowing on query start and returning on close 
rather than destroying and rebuilding.
   
   
   ### Benefit Assessment
   
   Benchmark on local disk (500k docs, 17MB index, 500 queries, JIT-warmed):
   
   ```
   No-pool:   avg=2.86 ms  (open=1.40 ms / 49%,  search=1.27 ms)
   With pool: avg=0.79 ms  (search only)
   Speedup:   3.62x
   ```
   
   On object storage the gap widens further: the open phase includes a full GET 
of the `.term` file (FST dictionary, typically several MB per shard). With the 
pool, `.term` stays resident in Rust memory across queries, eliminating both 
the latency and the object storage data transfer cost of repeated loading. For 
tables under heavy compaction, index files are replaced by new paths; stale 
pool entries go unused without affecting correctness.
   
   
   ### Tests
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] [tantivy] Reuse TantivySearcher across queries via searcher pool [paimon]

Reply via email to