I think we can support both parameters: k and threshold. And if we need to get all docs by the threshold, we just will set k == Integer.MAX_VALUE.
чт, 10 нояб. 2022 г. в 12:43, Adrien Grand <jpou...@gmail.com>: > I wonder if it would actually be a good idea to support filtering _only_ > based on distance. In the worst case scenario, this may require traversing > the whole HNSW graph and would run in linear time with the number of > vectors, with a high constant factor since we'd need to compute a distance > for every vector? I imagine that this would only make sense for low values > of the radius, so that few vectors would match, but this looks to me like > it would be hard to predict whether a given radius would actually match a > small set of vectors. Should the query still require a `k` value in > addition to the radius to make sure it doesn't go wild? > > On Tue, Nov 8, 2022 at 7:26 AM Alexey Gorlenko <agorlen...@gmail.com> > wrote: > >> Thanks, Michael! >> Yes, I will try. >> >> вт, 8 нояб. 2022 г. в 03:31, Michael Sokolov <msoko...@gmail.com>: >> >>> +1 to adding a scoring threshold. I think it could be another >>> parameter to KnnVectorQuery. Do you want to have a try at adding this? >>> If so, please feel free to open a PR and I will be happy to guide you. >>> >>> On Mon, Nov 7, 2022 at 6:38 AM Alexey Gorlenko <agorlen...@gmail.com> >>> wrote: >>> > >>> > Hi! >>> > >>> > There are some use cases where we need to find vectors with the >>> distance (by some metric) to the given vector V less than the given >>> threshold T. That task is very similar to the knn problem, but in this case >>> we don't have a quantity of the nearest neighbours k. >>> > >>> > As I see, the current implementation of knn doesn't provide such >>> functionality. But at the first glance it is not very difficult to modify >>> the method search of HnswGraph to implement that feature (do not limit >>> result size and get rid of candidates which exceed threshold). >>> > >>> > But maybe that idea has some not obvious problems which I haven't >>> noticed, and in reality an implementation of that idea would have >>> fundamental difficulties? >>> > >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: dev-h...@lucene.apache.org >>> >>> > > -- > Adrien >