[ 
https://issues.apache.org/jira/browse/CALCITE-7353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wgcn007 updated CALCITE-7353:
-----------------------------
    Description: 
h2. Background

The exponential growth of AI and large model applications has driven a surge in 
demand for vector similarity search. Databases like PostgreSQL, Redis, Doris, 
and Elasticsearch have already added vector retrieval support. Milvus, as a 
high-performance, cloud-native vector database designed for scalable 
Approximate Nearest Neighbor (ANN) search, has been is widely adopted. The goal 
of this Jira is to make Milvus more accessible by creating a full SQL 
abstraction layer.
h2. Implementation

I have completed a feasibility-validated demo implementation in [my repository 
|https://github.com/wg1026688210/calcite/commits/dev/], building a Calcite 
adapter with operator push-down to ensure computation executes on the Milvus 
side. Key capabilities include:
 * Vector Search Push-down

The MilvusVectorSearchRule identifies Sort→Project→[Filter]→Scan patterns. When 
ORDER BY contains a vector distance function, it fuses the entire query 
(filtering, projection, vector search, sorting, LIMIT) into a single 
MilvusVectorSearch operator pushed to Milvus, with SQL Hint support for search 
parameters:

 
{code:java}
SELECT book_name, l2_distance(VectorFieldAutoTest, '[0.1, 0.2, 0.3, 0.4]') AS d
FROM milvus.test_vector_search /*+ MILVUS_OPTIONS(nprobe='100000') */
WHERE book_name <> '小王子'
ORDER BY d
LIMIT 3 {code}
 

 
 * Predicate Push-down:

Converts scalar field comparison operators (=, <>, >, <, >=, <=, LIKE) and 
logical operators (AND, OR, NOT) into Milvus expression strings for server-side 
filtering:

 
{code:java}
SELECT * FROM milvus.vector_table WHERE id > 1 {code}




 
 * Projection Push-down:

Supports constant and column name projection push-down to minimize data 
transfer:


SELECT book_name, 'content' FROM milvus.vector_table
 * Vector UDF Fallback: Added distance calculation UDFs: L2_DISTANCE(), 
COSINE_DISTANCE(), and INNER_PRODUCT(). For complex operators (JOIN/UNION/AGG) 
or non-pushdown scenarios (unsupported UDFs, least-similar vector queries), 
vector computation falls back to in-memory execution, ensuring complete 
functionality. Example:

SELECT book_name, l2_distance(vector_field, '[...]') d
FROM milvus.vector_table
ORDER BY d DESC LIMIT 5 – Find least similar Top N

Core Component:
 * Milvus Metadata Layer:

 * MilvusSchema: Manages all collections in a Milvus Database, responsible for 
automatic collection discovery
 * MilvusTranslatableTable (Core): Table abstraction bridging Milvus 
collections and Calcite tables, providing field metadata, type mapping, and 
creating MilvusTableScan nodes in toRel()

 * Milvus SQL Operators & Rules Layer:

 * MilvusTableScan: Handles Milvus data scanning and SQL Hint storage
 * MilvusFilter / MilvusFilterRule: Implements predicate push-down, converting 
SQL conditions to Milvus expressions
 * MilvusProject / MilvusProjectRule: Supports constant and column name 
projection push-down
 * MilvusVectorSearch / MilvusVectorSearchRule (Core Optimization Rule): 
Identifies vector query patterns, validates sorting direction against distance 
type (L2/ASC, IP/COSINE/DESC), and fuses entire queries into a single operator 
for push-down
 * MilvusToEnumerableConverterRule / MilvusToEnumerableConverter: Code 
generation layer that converts Milvus physical operators to executable Java 
code, generating table.vectorSearch() or table.scan() calls
 * MilvusRel: Defines Milvus adapter calling convention, the unified interface 
for all Milvus operators

 * Milvus Query Execution Layer:

 * MilvusSearchEnumerator / MilvusQueryEnumerator: Respectively correspond to 
the Search (vector retrieval) and Query (paginated scanning) operations in 
Milvus.

 * Vector UDF Layer:

 * MilvusOperatorTable / MilvusVectorFunction / MilvusVectorUdfs: Implements 
registration, declaration and computation logic for L2/Cosine/IP distance 
functions

h2. Use Cases:
 * Build Milvus SQL Gateway Service: Provides standard SQL interface for Milvus 
data read/write and collection management, significantly improving usability 
and compatibility with existing SQL ecosystem tools.

 * Internal developed Multimodal Compute Engine Integration: Serves as a vector 
search execution engine integrated into internal compute platforms, enhancing 
the engine's functionality and improving the performance of vector retrieval.

  was:
h2. Background

The exponential growth of AI and large model applications has driven a surge in 
demand for vector similarity search. Databases like PostgreSQL, Redis, Doris, 
and Elasticsearch have already added vector retrieval support. Milvus, as a 
high-performance, cloud-native vector database designed for scalable 
Approximate Nearest Neighbor (ANN) search, has been is widely adopted. The goal 
of this Jira is to make Milvus more accessible by creating a full SQL 
abstraction layer.
h2. Implementation

I have completed a feasibility-validated demo implementation in [my repository 
|https://github.com/wg1026688210/calcite/commits/dev/], building a Calcite 
adapter with operator push-down to ensure computation executes on the Milvus 
side. Key capabilities include:
 * Vector Search Push-down
 ** The MilvusVectorSearchRule identifies Sort→Project→[Filter]→Scan patterns. 
When ORDER BY contains a vector distance function, it fuses the entire query 
(filtering, projection, vector search, sorting, LIMIT) into a single 
MilvusVectorSearch operator pushed to Milvus, with SQL Hint support for search 
parameters:

SELECT book_name, l2_distance(VectorFieldAutoTest, '[0.1, 0.2, 0.3, 0.4]') AS d
FROM milvus.test_vector_search /*+ MILVUS_OPTIONS(nprobe='100000') */
WHERE book_name <> '小王子'
ORDER BY d
LIMIT 3
 * Predicate Push-down:

Converts scalar field comparison operators (=, <>, >, <, >=, <=, LIKE) and 
logical operators (AND, OR, NOT) into Milvus expression strings for server-side 
filtering:
SELECT * FROM milvus.vector_table WHERE id > 1
 * Projection Push-down:

Supports constant and column name projection push-down to minimize data 
transfer:
SELECT book_name, 'content' FROM milvus.vector_table
 * Vector UDF Fallback: Added distance calculation UDFs: L2_DISTANCE(), 
COSINE_DISTANCE(), and INNER_PRODUCT(). For complex operators (JOIN/UNION/AGG) 
or non-pushdown scenarios (unsupported UDFs, least-similar vector queries), 
vector computation falls back to in-memory execution, ensuring complete 
functionality. Example:

SELECT book_name, l2_distance(vector_field, '[...]') d
FROM milvus.vector_table
ORDER BY d DESC LIMIT 5 – Find least similar Top N

Core Component:
 * Milvus Metadata Layer:

 * MilvusSchema: Manages all collections in a Milvus Database, responsible for 
automatic collection discovery
 * MilvusTranslatableTable (Core): Table abstraction bridging Milvus 
collections and Calcite tables, providing field metadata, type mapping, and 
creating MilvusTableScan nodes in toRel()

 * Milvus SQL Operators & Rules Layer:

 * MilvusTableScan: Handles Milvus data scanning and SQL Hint storage
 * MilvusFilter / MilvusFilterRule: Implements predicate push-down, converting 
SQL conditions to Milvus expressions
 * MilvusProject / MilvusProjectRule: Supports constant and column name 
projection push-down
 * MilvusVectorSearch / MilvusVectorSearchRule (Core Optimization Rule): 
Identifies vector query patterns, validates sorting direction against distance 
type (L2/ASC, IP/COSINE/DESC), and fuses entire queries into a single operator 
for push-down
 * MilvusToEnumerableConverterRule / MilvusToEnumerableConverter: Code 
generation layer that converts Milvus physical operators to executable Java 
code, generating table.vectorSearch() or table.scan() calls
 * MilvusRel: Defines Milvus adapter calling convention, the unified interface 
for all Milvus operators

 * Milvus Query Execution Layer:

 * MilvusSearchEnumerator / MilvusQueryEnumerator: Respectively correspond to 
the Search (vector retrieval) and Query (paginated scanning) operations in 
Milvus.

 * Vector UDF Layer:

 * MilvusOperatorTable / MilvusVectorFunction / MilvusVectorUdfs: Implements 
registration, declaration and computation logic for L2/Cosine/IP distance 
functions

h2. Use Cases:
 * Build Milvus SQL Gateway Service: Provides standard SQL interface for Milvus 
data read/write and collection management, significantly improving usability 
and compatibility with existing SQL ecosystem tools.

 * Internal developed Multimodal Compute Engine Integration: Serves as a vector 
search execution engine integrated into internal compute platforms, enhancing 
the engine's functionality and improving the performance of vector retrieval.


>  Support  Milvus Calcite Adapter 
> ---------------------------------
>
>                 Key: CALCITE-7353
>                 URL: https://issues.apache.org/jira/browse/CALCITE-7353
>             Project: Calcite
>          Issue Type: New Feature
>            Reporter: wgcn007
>            Priority: Major
>             Fix For: 1.42.0
>
>
> h2. Background
> The exponential growth of AI and large model applications has driven a surge 
> in demand for vector similarity search. Databases like PostgreSQL, Redis, 
> Doris, and Elasticsearch have already added vector retrieval support. Milvus, 
> as a high-performance, cloud-native vector database designed for scalable 
> Approximate Nearest Neighbor (ANN) search, has been is widely adopted. The 
> goal of this Jira is to make Milvus more accessible by creating a full SQL 
> abstraction layer.
> h2. Implementation
> I have completed a feasibility-validated demo implementation in [my 
> repository |https://github.com/wg1026688210/calcite/commits/dev/], building a 
> Calcite adapter with operator push-down to ensure computation executes on the 
> Milvus side. Key capabilities include:
>  * Vector Search Push-down
> The MilvusVectorSearchRule identifies Sort→Project→[Filter]→Scan patterns. 
> When ORDER BY contains a vector distance function, it fuses the entire query 
> (filtering, projection, vector search, sorting, LIMIT) into a single 
> MilvusVectorSearch operator pushed to Milvus, with SQL Hint support for 
> search parameters:
>  
> {code:java}
> SELECT book_name, l2_distance(VectorFieldAutoTest, '[0.1, 0.2, 0.3, 0.4]') AS 
> d
> FROM milvus.test_vector_search /*+ MILVUS_OPTIONS(nprobe='100000') */
> WHERE book_name <> '小王子'
> ORDER BY d
> LIMIT 3 {code}
>  
>  
>  * Predicate Push-down:
> Converts scalar field comparison operators (=, <>, >, <, >=, <=, LIKE) and 
> logical operators (AND, OR, NOT) into Milvus expression strings for 
> server-side filtering:
>  
> {code:java}
> SELECT * FROM milvus.vector_table WHERE id > 1 {code}
>  
>  * Projection Push-down:
> Supports constant and column name projection push-down to minimize data 
> transfer:
> SELECT book_name, 'content' FROM milvus.vector_table
>  * Vector UDF Fallback: Added distance calculation UDFs: L2_DISTANCE(), 
> COSINE_DISTANCE(), and INNER_PRODUCT(). For complex operators 
> (JOIN/UNION/AGG) or non-pushdown scenarios (unsupported UDFs, least-similar 
> vector queries), vector computation falls back to in-memory execution, 
> ensuring complete functionality. Example:
> SELECT book_name, l2_distance(vector_field, '[...]') d
> FROM milvus.vector_table
> ORDER BY d DESC LIMIT 5 – Find least similar Top N
> Core Component:
>  * Milvus Metadata Layer:
>  * MilvusSchema: Manages all collections in a Milvus Database, responsible 
> for automatic collection discovery
>  * MilvusTranslatableTable (Core): Table abstraction bridging Milvus 
> collections and Calcite tables, providing field metadata, type mapping, and 
> creating MilvusTableScan nodes in toRel()
>  * Milvus SQL Operators & Rules Layer:
>  * MilvusTableScan: Handles Milvus data scanning and SQL Hint storage
>  * MilvusFilter / MilvusFilterRule: Implements predicate push-down, 
> converting SQL conditions to Milvus expressions
>  * MilvusProject / MilvusProjectRule: Supports constant and column name 
> projection push-down
>  * MilvusVectorSearch / MilvusVectorSearchRule (Core Optimization Rule): 
> Identifies vector query patterns, validates sorting direction against 
> distance type (L2/ASC, IP/COSINE/DESC), and fuses entire queries into a 
> single operator for push-down
>  * MilvusToEnumerableConverterRule / MilvusToEnumerableConverter: Code 
> generation layer that converts Milvus physical operators to executable Java 
> code, generating table.vectorSearch() or table.scan() calls
>  * MilvusRel: Defines Milvus adapter calling convention, the unified 
> interface for all Milvus operators
>  * Milvus Query Execution Layer:
>  * MilvusSearchEnumerator / MilvusQueryEnumerator: Respectively correspond to 
> the Search (vector retrieval) and Query (paginated scanning) operations in 
> Milvus.
>  * Vector UDF Layer:
>  * MilvusOperatorTable / MilvusVectorFunction / MilvusVectorUdfs: Implements 
> registration, declaration and computation logic for L2/Cosine/IP distance 
> functions
> h2. Use Cases:
>  * Build Milvus SQL Gateway Service: Provides standard SQL interface for 
> Milvus data read/write and collection management, significantly improving 
> usability and compatibility with existing SQL ecosystem tools.
>  * Internal developed Multimodal Compute Engine Integration: Serves as a 
> vector search execution engine integrated into internal compute platforms, 
> enhancing the engine's functionality and improving the performance of vector 
> retrieval.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to