[ 
https://issues.apache.org/jira/browse/IMPALA-12467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17770158#comment-17770158
 ] 

Sreenath commented on IMPALA-12467:
-----------------------------------

Yup, and this would bring the strength of vector databases into Impala.

> Semantic Search In Impala
> -------------------------
>
>                 Key: IMPALA-12467
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12467
>             Project: IMPALA
>          Issue Type: Wish
>            Reporter: Sreenath
>            Priority: Major
>
> Semantic search is a way for computers to understand the meaning behind words 
> and phrases when you're searching for something. Instead of just looking for 
> exact matches of keywords, it tries to figure out what you're really asking 
> and provides results that are more relevant and meaningful to your question. 
> It's like having a search engine that can understand what you mean, not just 
> what you say, making it easier to find the information you're looking for. 
> This ticket is a wish to have semantic search in Impala.
> On the implementation side, semantic search uses an embedding model and any 
> of the similarity distance functions.
> My proposal is to implement functions for on-the-fly calculation of 
> similarity distance between two values. Once we have them we could easily do 
> semantic search as part of a where clause.
>  * Eg (using a cosine similarity function): “WHERE cos_dist(region, 'europe') 
> > 0.9“. And it could return records with regions like Scandinavia, Nordic, 
> Baltic etc…
>  * We could have functions thats accept values as text or as vector 
> embeddings.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to