gabrielmagno opened a new pull request, #1213:
URL: https://github.com/apache/solr/pull/1213

   https://issues.apache.org/jira/browse/SOLR-16574
   
   # Description
   
   Enrich the `films` example to demonstrate how to use the Dense Vectors 
feature.
   
   # Solution
   
   Added the field `film_vector` to the films dataset. This is an embedding 
vector created to represent the movie with 10 dimensions. The vector is created 
by combining the first 5 dimensions of a pre-trained BERT sentence model 
applied on the name of the movies plus the name of the genres, followed by an 
item2vec 5-dimensions model of itemset co-occurrence of genres in the movies, 
totaling 10 dimensions. Even though it is expected that similar movies will be 
close to each other, this is just a "toy example" model to serve as source for 
creating the films vectors.
   
   The `README` of the example was also updated to include the specification of 
the Dense Vector field in the schema. Also, a new section was created, with 
examples showing how to make KNN queries with the vectors.
   
   # Tests
   
   - Added the new field `film_vector` to the 3 dataset formats (JSON, XML, 
CSV), making sure to preserve the exact same data from the original datasets, 
so that the "diff" will be only the appendage of the new field.
   - Checked the creation of the collection for the 3 dataset formats. 
Regardless of the format all the 1100 films were indexed, and the `film_vector` 
field was correctly parsed and indexed as well.
   - Checked the KNN example queries for all the 3 dataset formats.
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [X] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms 
to the standards described there to the best of my ability.
   - [X] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [X] I have given Solr maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [X] I have developed this patch against the `main` branch.
   - [ ] I have run `./gradlew check`.
   - [ ] I have added tests for my changes.
   - [ ] I have added documentation for the [Reference 
Guide](https://github.com/apache/solr/tree/main/solr/solr-ref-guide)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to