This presentation by Rene Kriegler at Haystack 2018 was a real eye-opener
to me on this subject: https://haystackconf.com/2018/relevance-scoring/. Uses
random-projection forests which is a very clever technique.  (CC'ing Rene)

~ David

On Fri, Mar 1, 2019 at 1:30 PM Pedram Rezaei <pedr...@microsoft.com.invalid>
wrote:

> Hi there,
>
>
>
> Thank you for the responses. Yes, we have a few scenarios in mind that can
> benefit from a vector-based index optimized for ANN searches:
>
>
>
>    - Advanced, optimized, and high precision visual search: For this to
>    work, we would convert the images to their vector representations and then
>    use algorithms and implementations such as SPTAG
>    <https://github.com/Microsoft/SPTAG>, FAISS
>    <https://github.com/facebookresearch/faiss>, and HNSWLIB
>    <https://github.com/nmslib/hnswlib>.
>    - Advanced document retrieval: Using a numerical vector representation
>    of a document, we could improve the search result
>    - Nearest neighbor queries: discovering the nearest neighbors to a
>    given query could also benefit from these ANN algorithms (although doesn’t
>    necessarily need the vector based index)
>
>
>
> I would be grateful to hear your thoughts and whether the community is
> open to a conversation on this topic with my team.
>
>
>
> Thanks,
>
>
>
> Pedram
>
>
>
> *From:* J. Delgado <joaquin.delg...@gmail.com>
> *Sent:* Thursday, February 28, 2019 7:38 AM
> *To:* dev@lucene.apache.org
> *Cc:* Radhakrishnan Srikanth (SRIKANTH) <rsri...@microsoft.com>
> *Subject:* Re: Vector based store and ANN
>
>
>
> Lucene’s scoring function (which I believe is okapi BM25
>
> https://en.m.wikipedia.org/wiki/Okapi_BM25
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fen.m.wikipedia.org%2Fwiki%2FOkapi_BM25&data=02%7C01%7Cpedramr%40microsoft.com%7C17ae8da7b7f345efa57c08d69d92bf60%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636869650947060423&sdata=Hhj8I07%2F%2F2dSctKqpd%2FV9aEWwAI0k2dmPVwXmYe9dQw%3D&reserved=0>)
> is a kind of nearest neighbor using the TF-IDF vector representation of
> documents and query. Are you interested in ANN to be applied to a different
> kind of vector representation, say for example Doc2Vec?
>
>
>
> On Thu, Feb 28, 2019 at 5:59 AM Adrien Grand <jpou...@gmail.com> wrote:
>
> Hi Pedram,
>
> We don't have much in this area, but I'm hearing increasing interest
> so it'd be nice to get better there! The closest that we have is this
> class that can search for nearest neighbors for a vector of up to 8
> dimensions:
> https://github.com/apache/lucene-solr/blob/master/lucene/sandbox/src/java/org/apache/lucene/document/FloatPointNearestNeighbor.java
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Flucene-solr%2Fblob%2Fmaster%2Flucene%2Fsandbox%2Fsrc%2Fjava%2Forg%2Fapache%2Flucene%2Fdocument%2FFloatPointNearestNeighbor.java&data=02%7C01%7Cpedramr%40microsoft.com%7C17ae8da7b7f345efa57c08d69d92bf60%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636869650947060423&sdata=bMGC8DVC8FMsK3mfatzDF9WU5VO8FCk6G%2F1IoviPvsU%3D&reserved=0>
> .
>
> On Wed, Feb 27, 2019 at 1:44 AM Pedram Rezaei
> <pedr...@microsoft.com.invalid> wrote:
> >
> > Hi there,
> >
> >
> >
> > Is there a way to store numerical vectors (vector based index) and
> perform search based on Approximate Nearest Neighbor class of algorithms in
> Lucene?
> >
> >
> >
> > If not, has there been any interests in the topic so far?
> >
> >
> >
> > Thanks,
> >
> >
> >
> > Pedram
>
>
>
> --
> Adrien
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
> --
Lucene/Solr Search Committer (PMC), Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Reply via email to