Pardon for raising the dead thread, but it might be interesting https://openai.com/index/new-embedding-models-and-api-updates/ text-embedding-3-large embedding can be shortened to a size of 256 while still outperforming an unshortened text-embedding-ada-002 embedding with a size of 1536. It's about Matryoshka Representation Learning (MRL)
On Tue, May 23, 2023 at 12:27 PM Alessandro Benedetti <[email protected]> wrote: > Closing the poll after one week, these are the results: > > Option 2-4: 9 votes > make the limit configurable, potentially moving the limit to the > appropriate place > > Option 3: 5 votes > keep it as it is (1024) but move it lower level in HNSW-specific > implementation > > Option 1: 0 votes > keep it as it is (1024) > > ----- > I was expecting more people to express their preferences, unfortunately, > many digressed to discussions without expressing any. > Given that, it seems clear that we want one of the most voted options, so > let's continue the discussions under the related Pull Requests and then > proceed to merges when agreement if found! > > Thanks to everyone involved! > > > -------------------------- > *Alessandro Benedetti* > Director @ Sease Ltd. > *Apache Lucene/Solr Committer* > *Apache Solr PMC Member* > > e-mail: [email protected] > > > *Sease* - Information Retrieval Applied > Consulting | Training | Open Source > > Website: Sease.io <http://sease.io/> > LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter > <https://twitter.com/seaseltd> | Youtube > <https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github > <https://github.com/seaseltd> > > > On Mon, 22 May 2023 at 09:17, Bruno Roustant <[email protected]> > wrote: > >> I vote for option 3. >> Then with a follow up work to have a simple extension codec in the >> "codecs" package which is >> 1- not backward compatible, and 2- has a higher or configurable limit. >> That way users can directly use this codec without any additional code. >> > -- Sincerely yours Mikhail Khludnev
