wgcn007 created CALCITE-7353:
--------------------------------

             Summary:  Support  Milvus Calcite Adapter 
                 Key: CALCITE-7353
                 URL: https://issues.apache.org/jira/browse/CALCITE-7353
             Project: Calcite
          Issue Type: New Feature
            Reporter: wgcn007
             Fix For: 1.42.0


h2. Background

 The exponential growth of AI and large model applications has driven a surge 
in demand for vector similarity search. Databases like PostgreSQL, Redis, 
Doris, and Elasticsearch have already added vector retrieval support. Milvus, 
as a high-performance, cloud-native vector database designed for scalable 
Approximate Nearest Neighbor (ANN) search, has been is widely adopted. The goal 
of this Jira is to make Milvus more accessible by creating a full SQL 
abstraction layer.


h2. Implementation 

 I have completed a feasibility-validated demo implementation in my repository, 
building a Calcite adapter with operator push-down to ensure computation 
executes on the Milvus side. Key capabilities include:
 * Vector Search Push-down

 The MilvusVectorSearchRule identifies Sort→Project→[Filter]→Scan patterns. 
When ORDER BY contains a vector distance function, it fuses the entire query 
(filtering, projection, vector search, sorting, LIMIT) into a single 
MilvusVectorSearch operator pushed to Milvus, with SQL Hint support for search 
parameters:


SELECT book_name, l2_distance(VectorFieldAutoTest, '[0.1, 0.2, 0.3, 0.4]') AS d
FROM milvus.test_vector_search /*+ MILVUS_OPTIONS(nprobe='100000') */
WHERE book_name <> '小王子'
ORDER BY d
LIMIT 3


 * Predicate Push-down: 

Converts scalar field comparison operators (=, <>, >, <, >=, <=, LIKE) and 
logical operators (AND, OR, NOT) into Milvus expression strings for server-side 
filtering:
SELECT * FROM milvus.vector_table WHERE id > 1

 * Projection Push-down: 

Supports constant and column name projection push-down to minimize data 
transfer:
SELECT book_name, 'content' FROM milvus.vector_table




 * Vector UDF Fallback: Added distance calculation UDFs: L2_DISTANCE(), 
COSINE_DISTANCE(), and INNER_PRODUCT(). For complex operators (JOIN/UNION/AGG) 
or non-pushdown scenarios (unsupported UDFs, least-similar vector queries), 
vector computation falls back to in-memory execution, ensuring complete 
functionality. Example:

SELECT book_name, l2_distance(vector_field, '[...]') d
FROM milvus.vector_table
ORDER BY d DESC LIMIT 5  -- Find least similar Top N



Core Component:
 * Milvus Metadata Layer:

 * MilvusSchema: Manages all collections in a Milvus Database, responsible for 
automatic collection discovery
 * MilvusTranslatableTable (Core): Table abstraction bridging Milvus 
collections and Calcite tables, providing field metadata, type mapping, and 
creating MilvusTableScan nodes in toRel()

 * Milvus SQL Operators & Rules Layer:

 * MilvusTableScan: Handles Milvus data scanning and SQL Hint storage
 * MilvusFilter / MilvusFilterRule: Implements predicate push-down, converting 
SQL conditions to Milvus expressions
 * MilvusProject / MilvusProjectRule: Supports constant and column name 
projection push-down
 * MilvusVectorSearch / MilvusVectorSearchRule (Core Optimization Rule): 
Identifies vector query patterns, validates sorting direction against distance 
type (L2/ASC, IP/COSINE/DESC), and fuses entire queries into a single operator 
for push-down
 * MilvusToEnumerableConverterRule / MilvusToEnumerableConverter: Code 
generation layer that converts Milvus physical operators to executable Java 
code, generating table.vectorSearch() or table.scan() calls
 * MilvusRel: Defines Milvus adapter calling convention, the unified interface 
for all Milvus operators

 * Milvus Query Execution Layer:

 * MilvusSearchEnumerator / MilvusQueryEnumerator: Respectively correspond to 
the Search (vector retrieval) and Query (paginated scanning) operations in 
Milvus.

 * Vector UDF Layer:

 * MilvusOperatorTable / MilvusVectorFunction / MilvusVectorUdfs: Implements 
registration, declaration and computation logic for L2/Cosine/IP distance 
functions

h2. Use Cases:
 * Build Milvus SQL Gateway Service: Provides standard SQL interface for Milvus 
data read/write and collection management, significantly improving usability 
and compatibility with existing SQL ecosystem tools.

 * Internal developed Multimodal Compute Engine Integration: Serves as a vector 
search execution engine integrated into internal compute platforms, enhancing 
the engine's functionality and improving the performance of vector retrieval.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to