gabrielmagno opened a new pull request, #1213: URL: https://github.com/apache/solr/pull/1213
https://issues.apache.org/jira/browse/SOLR-16574 # Description Enrich the `films` example to demonstrate how to use the Dense Vectors feature. # Solution Added the field `film_vector` to the films dataset. This is an embedding vector created to represent the movie with 10 dimensions. The vector is created by combining the first 5 dimensions of a pre-trained BERT sentence model applied on the name of the movies plus the name of the genres, followed by an item2vec 5-dimensions model of itemset co-occurrence of genres in the movies, totaling 10 dimensions. Even though it is expected that similar movies will be close to each other, this is just a "toy example" model to serve as source for creating the films vectors. The `README` of the example was also updated to include the specification of the Dense Vector field in the schema. Also, a new section was created, with examples showing how to make KNN queries with the vectors. # Tests - Added the new field `film_vector` to the 3 dataset formats (JSON, XML, CSV), making sure to preserve the exact same data from the original datasets, so that the "diff" will be only the appendage of the new field. - Checked the creation of the collection for the 3 dataset formats. Regardless of the format all the 1100 films were indexed, and the `film_vector` field was correctly parsed and indexed as well. - Checked the KNN example queries for all the 3 dataset formats. # Checklist Please review the following and check all that apply: - [X] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms to the standards described there to the best of my ability. - [X] I have created a Jira issue and added the issue ID to my pull request title. - [X] I have given Solr maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended) - [X] I have developed this patch against the `main` branch. - [ ] I have run `./gradlew check`. - [ ] I have added tests for my changes. - [ ] I have added documentation for the [Reference Guide](https://github.com/apache/solr/tree/main/solr/solr-ref-guide) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
