Hi Together

I recently setup ChatGPT retrieval plugin locally

https://github.com/openai/chatgpt-retrieval-plugin

I think it would be nice to consider to submit a Lucene implementation for this plugin

https://github.com/openai/chatgpt-retrieval-plugin#future-directions

The plugin is using by default OpenAI's model "text-embedding-ada-002" with 1536 dimensions

https://openai.com/blog/new-and-improved-embedding-model

but which means one won't be able to use it out-of-the-box with Lucene.

Similar request here

https://learn.microsoft.com/en-us/answers/questions/1192796/open-ai-text-embedding-dimensions

I understand we just recently had a lenghty discussion about increasing the max dimension and whatever one thinks of OpenAI, fact is, that it has a huge impact and I think it would be nice that Lucene could be part of this "revolution". All we have to do is increase the limit from 1024 to 1536 or even 2048 for example.

Since the performace seems to be linear with the vector dimension and several members have done performance tests successfully and 1024 seems to have been chosen as max dimension quite arbitrarily in the first place, I think it should not be a problem to increase the max dimension by a factor 1.5 or 2.

WDYT?

Thanks

Michael



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to