[Wikitech-l] Re: Word embeddings / vector search

2023-05-16 Thread Dan Andreescu
> > On Meta there's a list of mailing lists that mentions "wikimedia-search", > but that list seems to be dead and the archive is full of spam. > Another list exists, called "discovery", but not listed on Meta. > https://lists.wikimedia.org/hyperkitty/list/discov...@lists.wikimedia.org/ Indeed

[Wikitech-l] Re: Word embeddings / vector search

2023-05-12 Thread Lars Aronsson
On 2023-05-09 22:09, Isaac Johnson wrote: +1 to the suggestion to connect with the Search team. Also a few more thoughts about vector / natural-language search and its relevance to Wikimedia from my perspective in Research: * The common critique of lexical / keyword-based search and why

[Wikitech-l] Re: Word embeddings / vector search

2023-05-09 Thread Daniel Kinzler
Hi Lars! It's certainly not a new idea, I literally wrote my master's thesis on it (German). It's an interesting idea, but not easy to make it work properly nicely. There is a lot of noise in the data. Here's a presentation I

[Wikitech-l] Re: Word embeddings / vector search

2023-05-09 Thread Amir Sarabadani
On top of the ones mentioned, ores topic detection model (the one that says what wikiproject an article belongs to, an example ) has been using word embedding since 2018-ish. HTH Am Di., 9.

[Wikitech-l] Re: Word embeddings / vector search

2023-05-09 Thread Isaac Johnson
+1 to the suggestion to connect with the Search team. Also a few more thoughts about vector / natural-language search and its relevance to Wikimedia from my perspective in Research: - The common critique of lexical / keyword-based search and why folks point to vector / embedding-based

[Wikitech-l] Re: Word embeddings / vector search

2023-05-09 Thread Dan Andreescu
I encourage you to reach out to the search team, they're lovely folks and even better engineers. On Tue, May 9, 2023 at 1:53 PM Lars Aronsson wrote: > On 2023-05-09 09:27, Thiemo Kreuz wrote: > > I'm curious what the actual question is. The basic concepts are > > studied for about 60 years, and

[Wikitech-l] Re: Word embeddings / vector search

2023-05-09 Thread Lars Aronsson
On 2023-05-09 09:27, Thiemo Kreuz wrote: I'm curious what the actual question is. The basic concepts are studied for about 60 years, and are in use for about 20 to 30 years. Sorry to hear that you're so negative. It's quite obvious that this is not currently used in Wikipedia, but is presented

[Wikitech-l] Re: Word embeddings / vector search

2023-05-09 Thread Thiemo Kreuz
I'm curious what the actual question is. The basic concepts are studied for about 60 years, and are in use for about 20 to 30 years. One particular detail the industry apparently needs to re-learn every time is how easily such vector spaces encode and reproduce any existing bias, racism, phobia,