gabrielmagno commented on PR #1213:
URL: https://github.com/apache/solr/pull/1213#issuecomment-1337440280
@epugh for this version I combined two "example" models (BERT + item2vec),
just to server as an example.
If we are willing provide the instructions on how to create the models and
the vectors itself, I guess it would be better to use a single model solution,
for simplicity. I could recreate the vectors using only BERT (which I believe
is good enought for our example).
The easiest way I know to create a vector representation of text data is by
using the `sentence_transformers` Python library with a pre-trained BERT model.
It is possible to create vectors with 3 lines of code:
```
from sentence_transformers import SentenceTransformer, util
model = SentenceTransformer("all-mpnet-base-v2")
my_vector = model.encode("This is my text")
```
The only issue is that the vectors from this model have **768 dimensions**.
For the example I simply got the first 5 dimensions and concatenate to the
other model. This is not a really appropriate way to create the vector in real
scenarios. There are other techniques (e.g. Model Distillation) that could
reduce the number of dimensions.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]