Another way is using postings - you can represent each dimension as a
term (`dim0`, `dim1`, etc) and index those that occur in a document.
To encode a value for a dimension you can either provide a custom term
frequency, or index the term multiple times. Then when searching you
can form a BooleanQuery from the terms in the sparse search vector and
use a simple similarity that sums the term frequencies for ranking. As
long as the number of non-zero dimensions in the query is low, this
should be efficient

On Mon, Dec 2, 2024 at 1:17 PM Viacheslav Dobrynin <w.v.d...@gmail.com> wrote:
>
> Hi,
>
> Thanks for the reply.
> I haven't tried to do that.
> However, I do not fully understand how in this case an inverted index will
> be constructed for an efficient search by terms (O(1) for each term as a key
> )?
>
>
> пн, 2 дек. 2024 г. в 21:55, Patrick Zhai <zhai7...@gmail.com>:
>
> > Hi, have you tried to encode the sparse vector yourself using the
> > BinaryDocValueField? One way I can think of is to encode it as (size,
> > index_array, value_array) per doc
> > Intuitively I feel like this should be more efficient than one dimension
> > per field if your dimension is high enough
> >
> > Patrick
> >
> > On Mon, Dec 2, 2024, 09:03 Viacheslav Dobrynin <w.v.d...@gmail.com> wrote:
> >
> > > Hi!
> > >
> > > I need to index sparse vectors, whereas as I understand it,
> > > KnnFloatVectorField is designed for dense vectors.
> > > Therefore, it seems that this approach will not work.
> > >
> > > вс, 1 дек. 2024 г. в 18:36, Mikhail Khludnev <m...@apache.org>:
> > >
> > > > Hi,
> > > > May it look like KnnFloatVectorField(... DOT_PRODUCT)
> > > > and KnnFloatVectorQuery?
> > > >
> > >
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to