[jira] [Commented] (CASSANDRA-19185) Vector search tests are failing on recall accuracy

Ekaterina Dimitrova (Jira) Sun, 17 Mar 2024 10:36:09 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-19185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17827800#comment-17827800
 ]


Ekaterina Dimitrova commented on CASSANDRA-19185:
-------------------------------------------------

The config question was addressed and the new CI repeated runs are all green. +1

[~mike_tr_adamson] , please propagate it to the trunk, too, and test, and if 
everything is green there too (which I am 99% sure it will be), we can commit 
it.

Do you need help to commit it, or do you have everything already set?

We follow the below guideline:

[https://cassandra.apache.org/_/development/how_to_commit.html]

The template we use for the commit message is at the bottom and there is 
procedure how to merge the patch between branches. Please ping me if you have 
questions or anything. I am not sure whether you already did your first commits 
in the Cassandra repo, etc, and the multi-branch strategy is not fun. Happy to 
help

 

> Vector search tests are failing on recall accuracy
> --------------------------------------------------
>
>                 Key: CASSANDRA-19185
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-19185
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Feature/SAI
>            Reporter: Mike Adamson
>            Assignee: Mike Adamson
>            Priority: Normal
>             Fix For: 5.0.x, 5.x
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Vector tests are failing randomly because they do not meet recall assertion 
> values. Currently, the following tests have been reported as failing:
> VectorSegmentationTest.testMultipleSegmentsForCompaction
> VectorDistributedTest.rangeRestrictedTest
> VectorDistributedTest.testPartitionRestrictedVectorSearch
> Since the vector searches are approximate and the vectors used in the tests 
> are random, it is unlikely that they will always meet a high recall. The 
> recall assertions are looking for recall values of 0.9 and above. Part of 
> this issue is related to the use of random values in the vectors being 
> tested. We have seen, with other tests, that the vector search performs 
> better with non-random generated datasets like the Glove datasets. As such, 
> there are the following available to fix these tests.
>  # Downgrade the assertions to a value that is likely to always pass. The 
> problem is that there is no guarantee that a test will always pass any recall 
> value we give it.
>  # Use generated datasets for these tests to see if that improves the recall 
> results.
>  # Remove the recall assertions unless they are specifically asked for. We 
> could use a system property to enable recall testing for targeted vector 
> testing.
> I don't think option 1 is a viable long-term solution as we can never be 
> certain that it will always work. Option 2 has more promise but it could 
> still result in failures because of the approximate nature of the vector 
> searches. As such, option 3 seems the only viable solution here but means 
> that, in most cases, we are only really testing that we are returning results 
> from the search, not how accurate those results are.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-19185) Vector search tests are failing on recall accuracy

Reply via email to