wgcn007 created CALCITE-7353:
--------------------------------
Summary: Support Milvus Calcite Adapter
Key: CALCITE-7353
URL: https://issues.apache.org/jira/browse/CALCITE-7353
Project: Calcite
Issue Type: New Feature
Reporter: wgcn007
Fix For: 1.42.0
h2. Background
The exponential growth of AI and large model applications has driven a surge
in demand for vector similarity search. Databases like PostgreSQL, Redis,
Doris, and Elasticsearch have already added vector retrieval support. Milvus,
as a high-performance, cloud-native vector database designed for scalable
Approximate Nearest Neighbor (ANN) search, has been is widely adopted. The goal
of this Jira is to make Milvus more accessible by creating a full SQL
abstraction layer.
h2. Implementation
I have completed a feasibility-validated demo implementation in my repository,
building a Calcite adapter with operator push-down to ensure computation
executes on the Milvus side. Key capabilities include:
* Vector Search Push-down
The MilvusVectorSearchRule identifies Sort→Project→[Filter]→Scan patterns.
When ORDER BY contains a vector distance function, it fuses the entire query
(filtering, projection, vector search, sorting, LIMIT) into a single
MilvusVectorSearch operator pushed to Milvus, with SQL Hint support for search
parameters:
SELECT book_name, l2_distance(VectorFieldAutoTest, '[0.1, 0.2, 0.3, 0.4]') AS d
FROM milvus.test_vector_search /*+ MILVUS_OPTIONS(nprobe='100000') */
WHERE book_name <> '小王子'
ORDER BY d
LIMIT 3
* Predicate Push-down:
Converts scalar field comparison operators (=, <>, >, <, >=, <=, LIKE) and
logical operators (AND, OR, NOT) into Milvus expression strings for server-side
filtering:
SELECT * FROM milvus.vector_table WHERE id > 1
* Projection Push-down:
Supports constant and column name projection push-down to minimize data
transfer:
SELECT book_name, 'content' FROM milvus.vector_table
* Vector UDF Fallback: Added distance calculation UDFs: L2_DISTANCE(),
COSINE_DISTANCE(), and INNER_PRODUCT(). For complex operators (JOIN/UNION/AGG)
or non-pushdown scenarios (unsupported UDFs, least-similar vector queries),
vector computation falls back to in-memory execution, ensuring complete
functionality. Example:
SELECT book_name, l2_distance(vector_field, '[...]') d
FROM milvus.vector_table
ORDER BY d DESC LIMIT 5 -- Find least similar Top N
Core Component:
* Milvus Metadata Layer:
* MilvusSchema: Manages all collections in a Milvus Database, responsible for
automatic collection discovery
* MilvusTranslatableTable (Core): Table abstraction bridging Milvus
collections and Calcite tables, providing field metadata, type mapping, and
creating MilvusTableScan nodes in toRel()
* Milvus SQL Operators & Rules Layer:
* MilvusTableScan: Handles Milvus data scanning and SQL Hint storage
* MilvusFilter / MilvusFilterRule: Implements predicate push-down, converting
SQL conditions to Milvus expressions
* MilvusProject / MilvusProjectRule: Supports constant and column name
projection push-down
* MilvusVectorSearch / MilvusVectorSearchRule (Core Optimization Rule):
Identifies vector query patterns, validates sorting direction against distance
type (L2/ASC, IP/COSINE/DESC), and fuses entire queries into a single operator
for push-down
* MilvusToEnumerableConverterRule / MilvusToEnumerableConverter: Code
generation layer that converts Milvus physical operators to executable Java
code, generating table.vectorSearch() or table.scan() calls
* MilvusRel: Defines Milvus adapter calling convention, the unified interface
for all Milvus operators
* Milvus Query Execution Layer:
* MilvusSearchEnumerator / MilvusQueryEnumerator: Respectively correspond to
the Search (vector retrieval) and Query (paginated scanning) operations in
Milvus.
* Vector UDF Layer:
* MilvusOperatorTable / MilvusVectorFunction / MilvusVectorUdfs: Implements
registration, declaration and computation logic for L2/Cosine/IP distance
functions
h2. Use Cases:
* Build Milvus SQL Gateway Service: Provides standard SQL interface for Milvus
data read/write and collection management, significantly improving usability
and compatibility with existing SQL ecosystem tools.
* Internal developed Multimodal Compute Engine Integration: Serves as a vector
search execution engine integrated into internal compute platforms, enhancing
the engine's functionality and improving the performance of vector retrieval.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)