Hello Beam Dev Community, I'm excited to share the design document for the Milvus Vector Enrichment Handler for Apache Beam as part of my GSoC 2025 project.
This enrichment handler will enable streaming and batch pipelines to leverage Milvus vector database capabilities for semantic similarity searches directly in Beam data processing workflows. The handler will support vector search, keyword-based search, and hybrid search strategies with various similarity metrics. Here is the link to the design document: https://docs.google.com/document/d/1lzoSGSblrFtf7YK9n5p9BEBw8jRhKeOOKqy0FP1urvg/edit?usp=sharing This implementation is part of the GSoC 2025 ML Integration project being tracked here: https://github.com/apache/beam/issues/35046 I welcome any feedback, suggestions, or questions about the design approach. Thank you, Mohamed