rahil-c opened a new issue, #18079: URL: https://github.com/apache/hudi/issues/18079
### Feature Description **What the feature achieves:** This feature provides the ability to perform a native **vector similarity search** on Hudi tables. **Why this feature is needed:** Building on [RFC-100](https://github.com/apache/hudi/pull/13924/files?short_path=a945f8d#diff-f05ae69c4f41edc32aabfbfc016a12ad1af72917314844f8ae52671234508c56) (unstructured data storage in Hudi), Hudi tables would contain unstructured content (e.g., images, video, documents) as well as the related *embeddings* for those contents. The next natural requirement for AI/ML workloads on Hudi is to **search these embeddings efficiently**: ### User Experience **How users will use this feature:** The initial scope of this feature was to be able to allow spark users the ability to perform vector search by providing a new `vector_search` SQL similar to other table value functions we have in Hudi. See the proposed RFC for more details: https://github.com/apache/hudi/pull/14218/changes. ### Hudi RFC Requirements **RFC PR link:** See the proposed RFC for more details: https://github.com/apache/hudi/pull/14218/changes **Why RFC is/isn't needed:** - Does this change public interfaces/APIs? (Yes/No) - Does this change storage format? (Yes/No) - Justification: -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
