Re: [PR] [core] Support batch vector search [paimon]

via GitHub Thu, 11 Jun 2026 07:26:02 -0700


XiaoHongbo-Hope commented on PR #7857:
URL: https://github.com/apache/paimon/pull/7857#issuecomment-4681608516


   > This batch query is not meaningful for performance, we need to ensure that 
the local file cache is enabled. See 
[ce38b2c](https://github.com/apache/paimon/commit/ce38b2ccd7cbaabcfc1d5e90f68586ebecc66265)
   
   I ran an OSS benchmark with a 529MB Lumina index, 100 queries, dim=1024.
   ```
     Without Paimon local cache:
     - user-level loop: 188.6s
     - batch: 39.7s
   
     With warm local memory cache:
     - user-level loop: 41.4s
     - batch: 39.7s
   ```
     So without cache, batch is effective. With warm cache, the benefit is very 
small, which matches your
     point.
   
     Since users need a batch API, for the Spark/Flink batch API, which 
direction do you prefer?
     1. implement it by looping single-vector search internally and relying on 
local cache;
     2. continue this PR and add a batch vector search/readBatch path.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [core] Support batch vector search [paimon]

Reply via email to