Another way is using postings - you can represent each dimension as a term (`dim0`, `dim1`, etc) and index those that occur in a document. To encode a value for a dimension you can either provide a custom term frequency, or index the term multiple times. Then when searching you can form a BooleanQuery from the terms in the sparse search vector and use a simple similarity that sums the term frequencies for ranking. As long as the number of non-zero dimensions in the query is low, this should be efficient
On Mon, Dec 2, 2024 at 1:17 PM Viacheslav Dobrynin <w.v.d...@gmail.com> wrote: > > Hi, > > Thanks for the reply. > I haven't tried to do that. > However, I do not fully understand how in this case an inverted index will > be constructed for an efficient search by terms (O(1) for each term as a key > )? > > > пн, 2 дек. 2024 г. в 21:55, Patrick Zhai <zhai7...@gmail.com>: > > > Hi, have you tried to encode the sparse vector yourself using the > > BinaryDocValueField? One way I can think of is to encode it as (size, > > index_array, value_array) per doc > > Intuitively I feel like this should be more efficient than one dimension > > per field if your dimension is high enough > > > > Patrick > > > > On Mon, Dec 2, 2024, 09:03 Viacheslav Dobrynin <w.v.d...@gmail.com> wrote: > > > > > Hi! > > > > > > I need to index sparse vectors, whereas as I understand it, > > > KnnFloatVectorField is designed for dense vectors. > > > Therefore, it seems that this approach will not work. > > > > > > вс, 1 дек. 2024 г. в 18:36, Mikhail Khludnev <m...@apache.org>: > > > > > > > Hi, > > > > May it look like KnnFloatVectorField(... DOT_PRODUCT) > > > > and KnnFloatVectorQuery? > > > > > > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org