rahil-c opened a new issue, #14219: URL: https://github.com/apache/hudi/issues/14219
### Feature Description #### What the feature achieves: This feature enables native vector similarity search capabilities directly on Hudi tables. It allows users to store, manage, and query vector embeddings (e.g., from text, image, or audio models) alongside structured data, and perform nearest-neighbor searches using distance metrics such as cosine, dot product, or Euclidean distance — all within Hudi tables. This brings AI/ML-centric search workloads (semantic, multimodal, or embedding-based retrieval) natively into the Hudi lakehouse. #### Why this feature is needed: Modern data lakes increasingly store unstructured or multimodal data (text, images, video) with associated embeddings for retrieval and ranking. Today, vector search is typically performed outside the lakehouse using specialized vector databases, leading to data duplication, inconsistency, and complex pipelines. Adding native vector search to Hudi unifies structured and vector data management, reduces latency between ingestion and retrieval, and enables scalable AI/ML workflows directly on the lakehouse without external dependencies. ### User Experience **How users will use this feature:** Please read RFC 102: https://github.com/apache/hudi/pull/14218 ### Hudi RFC Requirements **RFC PR link:** (if applicable) https://github.com/apache/hudi/pull/14218 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
