rahil-c opened a new issue, #14219:
URL: https://github.com/apache/hudi/issues/14219

   ### Feature Description
   
   #### What the feature achieves:
   
   This feature enables native vector similarity search capabilities directly 
on Hudi tables. It allows users to store, manage, and query vector embeddings 
(e.g., from text, image, or audio models) alongside structured data, and 
perform nearest-neighbor searches using distance metrics such as cosine, dot 
product, or Euclidean distance — all within Hudi tables. This brings 
AI/ML-centric search workloads (semantic, multimodal, or embedding-based 
retrieval) natively into the Hudi lakehouse.
   
   #### Why this feature is needed:
   
   Modern data lakes increasingly store unstructured or multimodal data (text, 
images, video) with associated embeddings for retrieval and ranking. Today, 
vector search is typically performed outside the lakehouse using specialized 
vector databases, leading to data duplication, inconsistency, and complex 
pipelines. Adding native vector search to Hudi unifies structured and vector 
data management, reduces latency between ingestion and retrieval, and enables 
scalable AI/ML workflows directly on the lakehouse without external 
dependencies.
   
   ### User Experience
   
   **How users will use this feature:**
   Please read RFC 102: https://github.com/apache/hudi/pull/14218
   
   
   ### Hudi RFC Requirements
   
   **RFC PR link:** (if applicable)
   https://github.com/apache/hudi/pull/14218
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to