Hi there,

Thank you for looping me in. Just a few random thoughts on this topic: 

- I’ve heard ;-) that this ES plugin is fast for vector-based scoring: 
https://github.com/StaySense/fast-cosine-similarity. The links in the ‘General’ 
section provide some history. As far as I can see, there is nothing which 
couldn’t be implemented at Lucene level.

- For retrieval, I think a two-pass approach looks like something worth trying 
out. First pass: look up documents in a low dimensional space (maybe produced 
via LSH) and then, in the second pass, calculate vector distances in the 
high-dimensional space just for the documents that resulted from the first 
pass. This solution will come with some compromises to make. For example, a 
higher dimensionality of LSH would increase precision but also produce more 
hash tokens and make the lookup slower, especially for large indexes.

- Day 2 of Haystack 2019 (https://haystackconf.com/agenda/) will have a talk by 
Simon Hughes about ’Search with Vectors’. There is a channel on this topic at 
OpenSource Connections’ search relevance Slack (https://relevancy.slack.com) 
and Simon has been one of the drivers of the discussion.

Best,
René


> On 1 Mar 2019, at 20:51, Pedram Rezaei <pedr...@microsoft.com> wrote:
> 
> Thank you for sharing, and it is exciting to see how advanced your thinking 
> is.
>  
> Yes, the idea is the same idea with an extra step that Rene also seems to 
> elude to here 
> <https://www.slideshare.net/RenKriegler/a-picture-is-worth-a-thousand-words-93680178>
>  in his comment. Instead of using these types of techniques only at the 
> scoring time, we can use them for information retrieval from the index. This 
> will allow us to, for example, index millions of images and quickly and 
> efficiently lookup the most relevant images.
>  
> I would love to hear yours and others thoughts on this. I think there is a 
> great opportunity here, but it would need a lot of input and guidance from 
> the experts here.
>  
> Thank you,
>  
> Pedram
>  
> From: David Smiley <david.w.smi...@gmail.com> 
> Sent: Friday, March 1, 2019 12:11 PM
> To: dev@lucene.apache.org
> Cc: Radhakrishnan Srikanth (SRIKANTH) <rsri...@microsoft.com>; Arun Sacheti 
> <ar...@bing.com>; Kun Wu <wu....@microsoft.com>; Junhua Wang 
> <junhua.w...@microsoft.com>; Jason Li <ja...@microsoft.com>; René Kriegler 
> <p...@rene-kriegler.com>
> Subject: Re: Vector based store and ANN
>  
> This presentation by Rene Kriegler at Haystack 2018 was a real eye-opener to 
> me on this subject: https://haystackconf.com/2018/relevance-scoring/ 
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fhaystackconf.com%2F2018%2Frelevance-scoring%2F&data=02%7C01%7Cpedramr%40microsoft.com%7Cd4ac932962eb42ef813e08d69e8216cd%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636870678908753995&sdata=sD7ZF4x1iXIjJ1GDAwlc0lUWkTpkarEkd2SAXI5qev0%3D&reserved=0>.
>  Uses random-projection forests which is a very clever technique.  (CC'ing 
> Rene)
>  
> ~ David
> 
> On Fri, Mar 1, 2019 at 1:30 PM Pedram Rezaei <pedr...@microsoft.com.invalid 
> <mailto:pedr...@microsoft.com.invalid>> wrote:
> Hi there,
>  
> Thank you for the responses. Yes, we have a few scenarios in mind that can 
> benefit from a vector-based index optimized for ANN searches:
>  
> Advanced, optimized, and high precision visual search: For this to work, we 
> would convert the images to their vector representations and then use 
> algorithms and implementations such as SPTAG 
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FMicrosoft%2FSPTAG&data=02%7C01%7Cpedramr%40microsoft.com%7Cd4ac932962eb42ef813e08d69e8216cd%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636870678908763999&sdata=pOKRUksZ4sTsgtbE7eW88kiFLovTAQJRiPz%2F2LQXvCg%3D&reserved=0>,
>  FAISS 
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ffacebookresearch%2Ffaiss&data=02%7C01%7Cpedramr%40microsoft.com%7Cd4ac932962eb42ef813e08d69e8216cd%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636870678908763999&sdata=if7uUn9OysK1c%2FDh6qb7hLcWGuaDjU9W5gKF2JQzOrk%3D&reserved=0>,
>  and HNSWLIB 
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnmslib%2Fhnswlib&data=02%7C01%7Cpedramr%40microsoft.com%7Cd4ac932962eb42ef813e08d69e8216cd%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636870678908774009&sdata=%2BFHGSAWnlsfe%2BhLiimjz1T%2B3YMH90pO%2FXSi15Eszzmg%3D&reserved=0>.
> Advanced document retrieval: Using a numerical vector representation of a 
> document, we could improve the search result
> Nearest neighbor queries: discovering the nearest neighbors to a given query 
> could also benefit from these ANN algorithms (although doesn’t necessarily 
> need the vector based index)
>  
> I would be grateful to hear your thoughts and whether the community is open 
> to a conversation on this topic with my team.
>  
> Thanks,
>  
> Pedram
>  
> From: J. Delgado <joaquin.delg...@gmail.com 
> <mailto:joaquin.delg...@gmail.com>> 
> Sent: Thursday, February 28, 2019 7:38 AM
> To: dev@lucene.apache.org <mailto:dev@lucene.apache.org>
> Cc: Radhakrishnan Srikanth (SRIKANTH) <rsri...@microsoft.com 
> <mailto:rsri...@microsoft.com>>
> Subject: Re: Vector based store and ANN
>  
> Lucene’s scoring function (which I believe is okapi BM25  
> https://en.m.wikipedia.org/wiki/Okapi_BM25 
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fen.m.wikipedia.org%2Fwiki%2FOkapi_BM25&data=02%7C01%7Cpedramr%40microsoft.com%7Cd4ac932962eb42ef813e08d69e8216cd%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636870678908774009&sdata=UsNUOOH88fog95sKTM%2FkgjYak5%2Bp%2F%2BWaMZYsMAgQ5MA%3D&reserved=0>)
>  is a kind of nearest neighbor using the TF-IDF vector representation of 
> documents and query. Are you interested in ANN to be applied to a different 
> kind of vector representation, say for example Doc2Vec?
>  
> On Thu, Feb 28, 2019 at 5:59 AM Adrien Grand <jpou...@gmail.com 
> <mailto:jpou...@gmail.com>> wrote:
> Hi Pedram,
> 
> We don't have much in this area, but I'm hearing increasing interest
> so it'd be nice to get better there! The closest that we have is this
> class that can search for nearest neighbors for a vector of up to 8
> dimensions: 
> https://github.com/apache/lucene-solr/blob/master/lucene/sandbox/src/java/org/apache/lucene/document/FloatPointNearestNeighbor.java
>  
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Flucene-solr%2Fblob%2Fmaster%2Flucene%2Fsandbox%2Fsrc%2Fjava%2Forg%2Fapache%2Flucene%2Fdocument%2FFloatPointNearestNeighbor.java&data=02%7C01%7Cpedramr%40microsoft.com%7Cd4ac932962eb42ef813e08d69e8216cd%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636870678908784014&sdata=XrrdrkhWOHp8%2FYLGowJK5%2B3km0f04Nr6BxPFxbiRQdM%3D&reserved=0>.
> 
> On Wed, Feb 27, 2019 at 1:44 AM Pedram Rezaei
> <pedr...@microsoft.com.invalid <mailto:pedr...@microsoft.com.invalid>> wrote:
> >
> > Hi there,
> >
> >
> >
> > Is there a way to store numerical vectors (vector based index) and perform 
> > search based on Approximate Nearest Neighbor class of algorithms in Lucene?
> >
> >
> >
> > If not, has there been any interests in the topic so far?
> >
> >
> >
> > Thanks,
> >
> >
> >
> > Pedram
> 
> 
> 
> -- 
> Adrien
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org 
> <mailto:dev-unsubscr...@lucene.apache.org>
> For additional commands, e-mail: dev-h...@lucene.apache.org 
> <mailto:dev-h...@lucene.apache.org>
> -- 
> Lucene/Solr Search Committer (PMC), Developer, Author, Speaker
> LinkedIn: http://linkedin.com/in/davidwsmiley 
> <https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Flinkedin.com%2Fin%2Fdavidwsmiley&data=02%7C01%7Cpedramr%40microsoft.com%7Cd4ac932962eb42ef813e08d69e8216cd%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636870678908794023&sdata=rmLY5WMZtQCZ99yumefC%2BQoglS4JeONfLShsj5qaWkU%3D&reserved=0>
>  | Book: http://www.solrenterprisesearchserver.com 
> <https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.solrenterprisesearchserver.com&data=02%7C01%7Cpedramr%40microsoft.com%7Cd4ac932962eb42ef813e08d69e8216cd%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636870678908794023&sdata=DZslOJYShNLZ9GOSpstuq85F%2FwVrFtnZIVDiXe%2F%2B0fw%3D&reserved=0>

Reply via email to