Hi, Thanks for the answer! I think this is similar to my initial implementation, where I built the query as follows (PyLucene):
def build_query(query): builder = BooleanQuery.Builder() for term in torch.nonzero(query): field_name = to_field_name(term.item()) value = query[term].item() builder.add(FieldValueAsScoreQuery(field_name, value), BooleanClause.Occur.SHOULD) return builder.build() And as a score, I used the value from the FloatDocValuesField field as follows: @Override public Scorer get(long leadCost) throws IOException { return new Scorer() { private final NumericDocValues iterator = context.reader().getNumericDocValues(field); @Override public float score() throws IOException { final int docId = docID(); assert docId != DocIdSetIterator.NO_MORE_DOCS; assert iterator.advanceExact(docId); return Float.intBitsToFloat((int) iterator.longValue()) * queryTermValue * boost; } @Override public int docID() { return iterator.docID(); } @Override public DocIdSetIterator iterator() { return iterator == null ? DocIdSetIterator.empty() : iterator; } @Override public float getMaxScore(int upTo) { return Float.MAX_VALUE; } }; } Overall it worked pretty well, thanks for confirming the idea. пн, 2 дек. 2024 г. в 22:42, Michael Sokolov <msoko...@gmail.com>: > Another way is using postings - you can represent each dimension as a > term (`dim0`, `dim1`, etc) and index those that occur in a document. > To encode a value for a dimension you can either provide a custom term > frequency, or index the term multiple times. Then when searching you > can form a BooleanQuery from the terms in the sparse search vector and > use a simple similarity that sums the term frequencies for ranking. As > long as the number of non-zero dimensions in the query is low, this > should be efficient > > On Mon, Dec 2, 2024 at 1:17 PM Viacheslav Dobrynin <w.v.d...@gmail.com> > wrote: > > > > Hi, > > > > Thanks for the reply. > > I haven't tried to do that. > > However, I do not fully understand how in this case an inverted index > will > > be constructed for an efficient search by terms (O(1) for each term as a > key > > )? > > > > > > пн, 2 дек. 2024 г. в 21:55, Patrick Zhai <zhai7...@gmail.com>: > > > > > Hi, have you tried to encode the sparse vector yourself using the > > > BinaryDocValueField? One way I can think of is to encode it as (size, > > > index_array, value_array) per doc > > > Intuitively I feel like this should be more efficient than one > dimension > > > per field if your dimension is high enough > > > > > > Patrick > > > > > > On Mon, Dec 2, 2024, 09:03 Viacheslav Dobrynin <w.v.d...@gmail.com> > wrote: > > > > > > > Hi! > > > > > > > > I need to index sparse vectors, whereas as I understand it, > > > > KnnFloatVectorField is designed for dense vectors. > > > > Therefore, it seems that this approach will not work. > > > > > > > > вс, 1 дек. 2024 г. в 18:36, Mikhail Khludnev <m...@apache.org>: > > > > > > > > > Hi, > > > > > May it look like KnnFloatVectorField(... DOT_PRODUCT) > > > > > and KnnFloatVectorQuery? > > > > > > > > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >