Sparse Embeddings

Michael Wechner Mon, 26 Jan 2026 06:47:47 -0800

Hi

I recently started to explore sparse embeddings using the sbert /sentence_transformers library


https://sbert.net/docs/sparse_encoder/usage/usage.html

whereas for example the following sentence "He drove to the stadium"gets embedded as follows:


tensor(indices=tensor([[    0,     0,     0,     0,     0,  0,     0,     0,

0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,

                            0,     0,     0],

[ 1996, 2000, 2001, 2002, 2010, 2018,2032, 2056, 2180, 2209, 2253, 2277, 2288, 2299,2343, 2346, 2359, 2365, 2374, 2380, 2441, 2482,2563, 2688, 2724, 2778, 2782, 2958, 3116, 3230,3298, 3309, 3346, 3478, 3598, 3942, 4019, 4062,4164, 4306, 4316, 4322, 4439, 4536, 4716, 5006,5225, 5439, 5533, 5581, 5823, 6891, 7281, 7467,7921, 8514,

                         9065, 11037, 21028]]),

values=tensor([0.2426, 1.2840, 0.4095, 1.3777, 0.6331, 0.7404,0.2711, 0.3561, 0.0691, 0.0325, 0.1355, 0.3256, 0.0203,0.7970, 0.0535, 0.1135, 0.0227, 0.0375, 0.8167, 0.5986,0.3390, 0.2573, 0.1621, 0.2597, 0.2726, 0.0191, 0.0752,0.0597, 0.2644, 0.7811, 1.4855, 0.0663, 2.8099, 0.4074,0.0778, 1.0642, 0.1952, 0.7472, 0.7306, 0.1108, 0.5747,1.5341, 1.9030, 0.2264, 0.0995, 0.3023, 1.1830, 0.1279,0.7824, 0.4283, 0.0288, 0.3535, 0.1833, 0.0554, 0.2662,0.0574,

                      0.4963, 0.2751, 0.0340]),
       device='mps:0', size=(1, 30522), nnz=59, layout=torch.sparse_coo)

The zeros just mean, that all tokens belong to the first sentence "Hedrove to the stadium" denoted by 0.

Then the 59 relevant token Ids (of the vocabulary of size 30522) arelisted and third the importance weights for the relevant tokens.


IIUC OpenSearch and Elasticsearch are both supporting sparse embeddings

https://sbert.net/examples/sparse_encoder/applications/semantic_search/README.html#opensearch-integration
https://sbert.net/examples/sparse_encoder/applications/semantic_search/README.html#elasticsearch-integration

but are sparse embeddings also supported by Lucene itself?

Thanks

Michael






---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Sparse Embeddings

Reply via email to