[ 
https://issues.apache.org/jira/browse/CALCITE-7353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18083450#comment-18083450
 ] 

wgcn007 commented on CALCITE-7353:
----------------------------------

Hi~ [~zabetak]   thanks for your concern.


- What's Milvus? It's an open-source vector database optimized for large-scale 
similarity search ( high-dimensional embeddings from AI models). Check it out 
at [https://milvus.io/] , and explore use cases at 
[https://milvus.io/use-cases]. It's powerful but requires users to learn its 
SDK, so a SQL layer helps a lot with accessibility.



- Is it a POC? No, it's productized. The adapter passed our internal testing 
and we're launching it this quarter as a MySQL-protocol-compatible SQL Gateway. 
Users can connect with any standard MySQL client (CLI, JDBC, DBeaver, etc.) and 
query Milvus directly.



- Current use case: We provide hosted Milvus services to other companies. Right 
now the SQL interface lets users do data exploration and preview—query 
collections, filter fields, run vector similarity search with familiar SQL 
instead of learning our SDKs.

>  Support  Milvus Calcite Adapter 
> ---------------------------------
>
>                 Key: CALCITE-7353
>                 URL: https://issues.apache.org/jira/browse/CALCITE-7353
>             Project: Calcite
>          Issue Type: New Feature
>            Reporter: wgcn007
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.42.0
>
>
> h2. Background
> The exponential growth of AI and large model applications has driven a surge 
> in demand for vector similarity search. Databases like PostgreSQL, Redis, 
> Doris, and Elasticsearch have already added vector retrieval support. Milvus, 
> as a high-performance, cloud-native vector database designed for scalable 
> Approximate Nearest Neighbor (ANN) search, has been is widely adopted. The 
> goal of this Jira is to make Milvus more accessible by creating a full SQL 
> abstraction layer.
> h2. Implementation
> I have completed a feasibility-validated demo implementation in [my 
> repository |https://github.com/wg1026688210/calcite/commits/dev/], building a 
> Calcite adapter with operator push-down to ensure computation executes on the 
> Milvus side. Key capabilities include:
>  * Vector Search Push-down
> The MilvusVectorSearchRule identifies Sort→Project→[Filter]→Scan patterns. 
> When ORDER BY contains a vector distance function, it fuses the entire query 
> (filtering, projection, vector search, sorting, LIMIT) into a single 
> MilvusVectorSearch operator pushed to Milvus, with SQL Hint support for 
> search parameters:
> {code:java}
> SELECT book_name, l2_distance(VectorFieldAutoTest, '[0.1, 0.2, 0.3, 0.4]') AS 
> d
> FROM milvus.test_vector_search /*+ MILVUS_OPTIONS(nprobe='100000') */
> WHERE book_name <> '小王子'
> ORDER BY d
> LIMIT 3 {code}
>  
>  * Predicate Push-down:
> Converts scalar field comparison operators (=, <>, >, <, >=, <=, LIKE) and 
> logical operators (AND, OR, NOT) into Milvus expression strings for 
> server-side filtering:
> {code:java}
> SELECT * FROM milvus.vector_table WHERE id > 1 {code}
>  
>  * Projection Push-down:
> Supports constant and column name projection push-down to minimize data 
> transfer:
> {code:java}
> SELECT book_name, 'content' FROM milvus.vector_table {code}
>  
>  * Vector UDF Fallback:
> Added distance calculation UDFs: L2_DISTANCE(), COSINE_DISTANCE(), and 
> INNER_PRODUCT(). For complex operators (JOIN/UNION/AGG) or non-pushdown 
> scenarios (unsupported UDFs, least-similar vector queries), vector 
> computation falls back to in-memory execution, ensuring complete 
> functionality. Example:
>  
> {code:java}
> SELECT book_name, l2_distance(vector_field, '[...]') d
>   FROM milvus.vector_table
>   ORDER BY d DESC LIMIT 5  -- Find least similar Top N {code}
>  
> Core Component:
>  * Milvus Metadata Layer:
>  * MilvusSchema: Manages all collections in a Milvus Database, responsible 
> for automatic collection discovery
>  * MilvusTranslatableTable (Core): Table abstraction bridging Milvus 
> collections and Calcite tables, providing field metadata, type mapping, and 
> creating MilvusTableScan nodes in toRel()
>  * Milvus SQL Operators & Rules Layer:
>  * MilvusTableScan: Handles Milvus data scanning and SQL Hint storage
>  * MilvusFilter / MilvusFilterRule: Implements predicate push-down, 
> converting SQL conditions to Milvus expressions
>  * MilvusProject / MilvusProjectRule: Supports constant and column name 
> projection push-down
>  * MilvusVectorSearch / MilvusVectorSearchRule (Core Optimization Rule): 
> Identifies vector query patterns, validates sorting direction against 
> distance type (L2/ASC, IP/COSINE/DESC), and fuses entire queries into a 
> single operator for push-down
>  * MilvusToEnumerableConverterRule / MilvusToEnumerableConverter: Code 
> generation layer that converts Milvus physical operators to executable Java 
> code, generating table.vectorSearch() or table.scan() calls
>  * MilvusRel: Defines Milvus adapter calling convention, the unified 
> interface for all Milvus operators
>  * Milvus Query Execution Layer:
>  * MilvusSearchEnumerator / MilvusQueryEnumerator: Respectively correspond to 
> the Search (vector retrieval) and Query (paginated scanning) operations in 
> Milvus.
>  * Vector UDF:
>  * MilvusOperatorTable / MilvusVectorFunction / MilvusVectorUdfs: Implements 
> registration, declaration and computation logic for L2/Cosine/IP distance 
> functions
> h2. Use Cases:
>  * Build Milvus SQL Gateway Service: Provides standard SQL interface for 
> Milvus data read/write and collection management, significantly improving 
> usability and compatibility with existing SQL ecosystem tools.
>  * Internal developed Multimodal Compute Engine Integration: Serves as a 
> vector search execution engine integrated into internal compute platforms, 
> enhancing the engine's functionality and improving the performance of vector 
> retrieval.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to