Hi, I'd like to start a discussion of FLIP-540: Support VECTOR_SEARCH in Flink SQL[1].
In FLIP-437/FLIP-525, Apache Flink has initially integrated Large Language Model (LLM) capabilities, enabling semantic understanding and real-time processing of streaming data pipelines. This integration has been technically validated in scenarios such as log classification and real-time question-answering systems. However, the current architecture allows Flink to only use embedding models to convert unstructured data (e.g., text, images) into high-dimensional vector features, which are then persisted to downstream storage systems (e.g., Milvus, Mongodb). It lacks real-time online querying and similarity analysis capabilities for vector spaces. To address this limitation, we propose introducing the VECTOR_SEARCH function in this FLIP, enabling users to perform streaming vector similarity searches and real-time context retrieval (e.g., Retrieval-Augmented Generation, RAG) directly within Flink. Looking forward to comments and suggestions for improvements! Best, Shengkai [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-540%3A+Support+VECTOR_SEARCH+in+Flink+SQL
