Hey Michael, Yeah, the Apache Lucene field types used by Elasticsearch is FeatureField: https://lucene.apache.org/core/10_3_2/core/org/apache/lucene/document/FeatureField.html
To query, it's a boolean query of the non-zero components with the `linearQuery` option: https://lucene.apache.org/core/10_3_2/core/org/apache/lucene/document/FeatureField.html#newLinearQuery(java.lang.String,java.lang.String,float) Hope this helps! Ben On Mon, Jan 26, 2026 at 9:47 AM Michael Wechner <[email protected]> wrote: > Hi > > I recently started to explore sparse embeddings using the sbert / > sentence_transformers library > > https://sbert.net/docs/sparse_encoder/usage/usage.html > > whereas for example the following sentence "He drove to the stadium" > gets embedded as follows: > > tensor(indices=tensor([[ 0, 0, 0, 0, 0, 0, 0, > 0, > 0, 0, 0, 0, 0, 0, > 0, 0, > 0, 0, 0, 0, 0, 0, > 0, 0, > 0, 0, 0, 0, 0, 0, > 0, 0, > 0, 0, 0, 0, 0, 0, > 0, 0, > 0, 0, 0, 0, 0, 0, > 0, 0, > 0, 0, 0, 0, 0, 0, > 0, 0, > 0, 0, 0], > [ 1996, 2000, 2001, 2002, 2010, 2018, > 2032, 2056, > 2180, 2209, 2253, 2277, 2288, 2299, > 2343, 2346, > 2359, 2365, 2374, 2380, 2441, 2482, > 2563, 2688, > 2724, 2778, 2782, 2958, 3116, 3230, > 3298, 3309, > 3346, 3478, 3598, 3942, 4019, 4062, > 4164, 4306, > 4316, 4322, 4439, 4536, 4716, 5006, > 5225, 5439, > 5533, 5581, 5823, 6891, 7281, 7467, > 7921, 8514, > 9065, 11037, 21028]]), > values=tensor([0.2426, 1.2840, 0.4095, 1.3777, 0.6331, 0.7404, > 0.2711, > 0.3561, 0.0691, 0.0325, 0.1355, 0.3256, 0.0203, > 0.7970, > 0.0535, 0.1135, 0.0227, 0.0375, 0.8167, 0.5986, > 0.3390, > 0.2573, 0.1621, 0.2597, 0.2726, 0.0191, 0.0752, > 0.0597, > 0.2644, 0.7811, 1.4855, 0.0663, 2.8099, 0.4074, > 0.0778, > 1.0642, 0.1952, 0.7472, 0.7306, 0.1108, 0.5747, > 1.5341, > 1.9030, 0.2264, 0.0995, 0.3023, 1.1830, 0.1279, > 0.7824, > 0.4283, 0.0288, 0.3535, 0.1833, 0.0554, 0.2662, > 0.0574, > 0.4963, 0.2751, 0.0340]), > device='mps:0', size=(1, 30522), nnz=59, layout=torch.sparse_coo) > > The zeros just mean, that all tokens belong to the first sentence "He > drove to the stadium" denoted by 0. > > Then the 59 relevant token Ids (of the vocabulary of size 30522) are > listed and third the importance weights for the relevant tokens. > > IIUC OpenSearch and Elasticsearch are both supporting sparse embeddings > > > https://sbert.net/examples/sparse_encoder/applications/semantic_search/README.html#opensearch-integration > > https://sbert.net/examples/sparse_encoder/applications/semantic_search/README.html#elasticsearch-integration > > but are sparse embeddings also supported by Lucene itself? > > Thanks > > Michael > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
