JingsongLi opened a new pull request, #8258:
URL: https://github.com/apache/paimon/pull/8258

   ## Summary
   
   Add first-phase Spark SQL support for multi-vector search over multiple 
vector columns. The implementation fans out to existing per-column global 
vector indexes and fuses the scored row ids before the normal Paimon scan, so 
it does not require any index format changes.
   
   ## Changes
   
   - Add `multi_vector_search` table-valued function with query-map syntax, 
final limit, and optional fusion options.
   - Add multi-vector predicate objects and fusion utilities supporting `rrf` 
and `weighted_score` with optional per-column weights.
   - Extend Spark scan plumbing so a `VectorSearchTable` can carry either a 
single vector search or a multi-vector search.
   - Reuse existing vector index builders for each route, including 
partition/data prefilters and query-time vector index options.
   - Add parser/unit coverage and an end-to-end Spark SQL test that builds two 
vector indexes and queries both columns.
   
   Example:
   
   ```sql
   SELECT id, __paimon_vector_search_score
   FROM multi_vector_search(
     'T',
     map(
       'title_vec', array(1.0f, 0.0f),
       'body_vec', array(0.0f, 1.0f)),
     2,
     map('fusion', 'rrf', 'route_limit', '2'))
   ```
   
   ## Testing
   
   - [x] `mvn -pl paimon-common -Pfast-build -Dtest=MultiVectorSearchFusionTest 
test`
   - [x] `mvn -pl paimon-spark/paimon-spark-common -am -Pfast-build 
-DfailIfNoTests=false 
-DwildcardSuites=org.apache.paimon.spark.catalyst.plans.logical.VectorSearchQueryTest
 -Dtest=none test`
   - [x] `mvn -Pspark3 -pl 
:paimon-spark-common_2.12,:paimon-spark3-common_2.12,:paimon-spark-ut_2.12 -am 
-Pfast-build -DfailIfNoTests=false 
-DwildcardSuites=org.apache.paimon.spark.sql.MultiVectorSearchTest -Dtest=none 
test`
   - [x] `mvn -Pspark3 -pl :paimon-spark-3.2_2.12,:paimon-spark-3.3_2.12 -am 
-Pfast-build -DskipTests compile`
   - [x] `git diff --check`
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to