Pardon for raising the dead thread, but it might be interesting

https://openai.com/index/new-embedding-models-and-api-updates/
 text-embedding-3-large embedding can be shortened to a size of 256 while
still outperforming an unshortened text-embedding-ada-002 embedding with a
size of 1536.
It's about Matryoshka Representation Learning (MRL)

On Tue, May 23, 2023 at 12:27 PM Alessandro Benedetti <[email protected]>
wrote:

> Closing the poll after one week, these are the results:
>
> Option 2-4: 9 votes
> make the limit configurable, potentially moving the limit to the
> appropriate place
>
> Option 3: 5 votes
> keep it as it is (1024) but move it lower level in HNSW-specific
> implementation
>
> Option 1: 0 votes
> keep it as it is (1024)
>
> -----
> I was expecting more people to express their preferences, unfortunately,
> many digressed to discussions without expressing any.
> Given that, it seems clear that we want one of the most voted options, so
> let's continue the discussions under the related Pull Requests and then
> proceed to merges when agreement if found!
>
> Thanks to everyone involved!
>
>
> --------------------------
> *Alessandro Benedetti*
> Director @ Sease Ltd.
> *Apache Lucene/Solr Committer*
> *Apache Solr PMC Member*
>
> e-mail: [email protected]
>
>
> *Sease* - Information Retrieval Applied
> Consulting | Training | Open Source
>
> Website: Sease.io <http://sease.io/>
> LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter
> <https://twitter.com/seaseltd> | Youtube
> <https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github
> <https://github.com/seaseltd>
>
>
> On Mon, 22 May 2023 at 09:17, Bruno Roustant <[email protected]>
> wrote:
>
>> I vote for option 3.
>> Then with a follow up work to have a simple extension codec in the
>> "codecs" package which is
>> 1- not backward compatible, and 2- has a higher or configurable limit.
>> That way users can directly use this codec without any additional code.
>>
>

-- 
Sincerely yours
Mikhail Khludnev

Reply via email to