JingsongLi opened a new pull request, #8255:
URL: https://github.com/apache/paimon/pull/8255

   ## Summary
   
   Adds an opt-in raw fallback path for vector search so queries can merge 
indexed hits with exact scores from current raw vector rows that are not yet 
covered by vector indexes. The fallback is disabled by default via 
`vector.raw-fallback.enabled`.
   
   ## Changes
   
   - Add raw fallback evaluation in `VectorReadImpl`: discover current data 
row-id ranges, exclude vector-indexed ranges, read remaining raw vectors with 
row ids, compute exact scores, and merge final topK.
   - Keep the existing index-only behavior unchanged unless 
`vector.raw-fallback.enabled=true` is passed.
   - Preserve freshness with filters by using partition-only range discovery 
and applying normal predicates while reading raw rows.
   - Pass partition filters through vector reads and route Spark vector reads 
to the local hybrid path when raw fallback is enabled.
   - Add regression coverage for default-off behavior and filtered unindexed 
rows.
   
   ## Testing
   
   - `mvn -pl paimon-core -am -Pfast-build -DfailIfNoTests=false 
-Dtest=VectorSearchBuilderTest#testVectorSearchRawFallbackRequiresExplicitOption
 test`
   - `mvn -pl paimon-core -am -Pfast-build -DfailIfNoTests=false 
-Dtest=VectorSearchBuilderTest#testVectorSearchRawFallbackScansFilteredUnindexedData
 test`
   - `mvn -pl paimon-core -am -DskipTests compile`
   - `mvn -pl paimon-spark/paimon-spark-common -am -DskipTests compile`
   - `git diff --check`
   
   ## Notes
   
   A full local `VectorSearchBuilderTest` run was also attempted; it still 
fails in the existing `testVectorSearchWithPartitionFilter` path with a 
codegen/plugin-loader NPE in `PluginLoader.discover`, unrelated to this raw 
fallback change.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to