JingsongLi opened a new pull request, #8255: URL: https://github.com/apache/paimon/pull/8255
## Summary Adds an opt-in raw fallback path for vector search so queries can merge indexed hits with exact scores from current raw vector rows that are not yet covered by vector indexes. The fallback is disabled by default via `vector.raw-fallback.enabled`. ## Changes - Add raw fallback evaluation in `VectorReadImpl`: discover current data row-id ranges, exclude vector-indexed ranges, read remaining raw vectors with row ids, compute exact scores, and merge final topK. - Keep the existing index-only behavior unchanged unless `vector.raw-fallback.enabled=true` is passed. - Preserve freshness with filters by using partition-only range discovery and applying normal predicates while reading raw rows. - Pass partition filters through vector reads and route Spark vector reads to the local hybrid path when raw fallback is enabled. - Add regression coverage for default-off behavior and filtered unindexed rows. ## Testing - `mvn -pl paimon-core -am -Pfast-build -DfailIfNoTests=false -Dtest=VectorSearchBuilderTest#testVectorSearchRawFallbackRequiresExplicitOption test` - `mvn -pl paimon-core -am -Pfast-build -DfailIfNoTests=false -Dtest=VectorSearchBuilderTest#testVectorSearchRawFallbackScansFilteredUnindexedData test` - `mvn -pl paimon-core -am -DskipTests compile` - `mvn -pl paimon-spark/paimon-spark-common -am -DskipTests compile` - `git diff --check` ## Notes A full local `VectorSearchBuilderTest` run was also attempted; it still fails in the existing `testVectorSearchWithPartitionFilter` path with a codegen/plugin-loader NPE in `PluginLoader.discover`, unrelated to this raw fallback change. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
