siddharthteotia commented on issue #10919:
URL: https://github.com/apache/pinot/issues/10919#issuecomment-1592971805

   Glad to see there are others thinking about this as well. 
   
   I had recently created a short internal proposal on why a case can be made 
for vector storage and indexing in Pinot. 
   
   I think first thing we need to do is to get alignment / consensus within the 
community that it makes sense to do vector search in Pinot 
   
   This is our internal Description and Business Justification we created. 
@jasperjiaguo can add more info
   
   **Description**
   
   Vector embeddings are numerical coordinate (multi dimensional space) based 
representations typically resulting from a machine learning model training. For 
example training of LLM on text can produce billions of vector embeddings which 
are the distilled representation of text / words (training data). Goal is to 
build optimal storage, indexing and query execution capabilities for such kind 
of data.
   
   **Benefit / Use Case**
   
   Can be a crucial foundation for AI systems that can leverage high 
performance similarity indexing and analytics on vector embeddings for 
recommendation, image matching, pattern recognition, anomaly detection etc. 
   
   Specifically in the case of LLMs and prompt engineering pipeline - vector 
storage, indexing and querying can be used to store and query domain specific 
facts (that were created during training e.g neural network learning) which can 
then be fed into NLP models / ChatBots, Conversational Prompts etc 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to