Re: Experience re OpenAI embeddings in combination with Lucene vector search

Michael Wechner Mon, 14 Feb 2022 12:08:12 -0800

Hi Julie

Thanks very much for this link, which is very interesting!


Btw, do you have an idea how to increase the default max size of 1024?

https://lists.apache.org/thread/hyb6w5c4x5rjt34k3w7zqn3yp5wvf33o

Thanks

Michael



Am 14.02.22 um 17:45 schrieb Julie Tibshirani:

Hello Michael, I don't have personal experience with these models, butI found this article insightful:https://medium.com/@nils_reimers/openai-gpt-3-text-embeddings-really-a-new-state-of-the-art-in-dense-text-embeddings-6571fe3ec9d9.It evaluates the OpenAI models against a variety of existing models ontasks like sentence similarity and text retrieval. Although the othermodels are cheaper and have fewer dimensions, the OpenAI ones performsimilarly or worse. This got me thinking that they might not be a goodcost/ effectiveness trade-off, especially the larger ones with 4096or 12288 dimensions.


Julie

On Sun, Feb 13, 2022 at 1:55 AM Michael Wechner<michael.wech...@wyona.com> wrote:


    Re the OpenAI embedding the following recent paper might be of
    interest

    https://arxiv.org/pdf/2201.10005.pdf

    (Text and Code Embeddings by Contrastive Pre-Training, Jan 24, 2022)

    Thanks

    Michael

    Am 13.02.22 um 00:14 schrieb Michael Wechner:

    Here a concrete example where I combine OpenAI model
    "text-similarity-ada-001" with Lucene vector search

    INPUT sentence: "What is your age this year?"

    Result sentences

    1) How old are you this year?
       score '0.98860765'

    2) What was your age last year?
       score '0.97811764'

    3) What is your age?
       score '0.97094905'

    4) How old are you?
       score '0.9600177'


    Result 1 is great and result 2 looks similar, but is not correct
    from an "understanding" point of view and results 3 and 4 are
    good again.

    I understand "similarity" is not the same as "understanding", but
    I hope it makes it clearer what I am looking for :-)

    Thanks

    Michael



    Am 12.02.22 um 22:38 schrieb Michael Wechner:

    Hi Alessandro

    I am mainly interested in detecting similarity, for example
    whether the following two sentences are similar resp. likely to
    mean the same thing

    "How old are you?"
    "What is your age?"

    and that the following two sentences are not similar, resp. do
    not mean the same thing

    "How old are you this year?"
    "How old have you been last year?"

    But also performance or how OpenAI embeddings compare for
    example with SBERT
    (https://sbert.net/docs/usage/semantic_textual_similarity.html)

    Thanks

    Michael



    Am 12.02.22 um 20:41 schrieb Alessandro Benedetti:

    Hi Michael, experience to what extent?
    We have been exploring the area for a while given we
    contributed the first neural search milestone to Apache Solr.
    What is your curiosity? Performance? Relevance impact? How to
    integrate it?
    Regards

    On Fri, 11 Feb 2022, 22:38 Michael Wechner,
    <michael.wech...@wyona.com> wrote:

        Hi

        Does anyone have experience using OpenAI embeddings in
        combination with Lucene vector search?

        https://beta.openai.com/docs/guides/embeddings|

        for example comparing performance re vector size

        
||https://api.openai.com/v1/engines/|||text-similarity-ada-001|/embeddings

        and

        
||https://api.openai.com/v1/engines/||||text-similarity-davinci-001||/embeddings

        ?

        ||
        |Thanks

        Michael

Re: Experience re OpenAI embeddings in combination with Lucene vector search

Reply via email to