XiaoHongbo-Hope commented on PR #7857: URL: https://github.com/apache/paimon/pull/7857#issuecomment-4681608516
> This batch query is not meaningful for performance, we need to ensure that the local file cache is enabled. See [ce38b2c](https://github.com/apache/paimon/commit/ce38b2ccd7cbaabcfc1d5e90f68586ebecc66265) I ran an OSS benchmark with a 529MB Lumina index, 100 queries, dim=1024. ``` Without Paimon local cache: - user-level loop: 188.6s - batch: 39.7s With warm local memory cache: - user-level loop: 41.4s - batch: 39.7s ``` So without cache, batch is effective. With warm cache, the benefit is very small, which matches your point. Since users need a batch API, for the Spark/Flink batch API, which direction do you prefer? 1. implement it by looping single-vector search internally and relying on local cache; 2. continue this PR and add a batch vector search/readBatch path. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
