I think we can support both parameters: k and threshold. And if we need to
get all docs by the threshold, we just will set k == Integer.MAX_VALUE.

чт, 10 нояб. 2022 г. в 12:43, Adrien Grand <jpou...@gmail.com>:

> I wonder if it would actually be a good idea to support filtering _only_
> based on distance. In the worst case scenario, this may require traversing
> the whole HNSW graph and would run in linear time with the number of
> vectors, with a high constant factor since we'd need to compute a distance
> for every vector? I imagine that this would only make sense for low values
> of the radius, so that few vectors would match, but this looks to me like
> it would be hard to predict whether a given radius would actually match a
> small set of vectors. Should the query still require a `k` value in
> addition to the radius to make sure it doesn't go wild?
>
> On Tue, Nov 8, 2022 at 7:26 AM Alexey Gorlenko <agorlen...@gmail.com>
> wrote:
>
>> Thanks, Michael!
>> Yes, I will try.
>>
>> вт, 8 нояб. 2022 г. в 03:31, Michael Sokolov <msoko...@gmail.com>:
>>
>>> +1 to adding a scoring threshold. I think it could be another
>>> parameter to KnnVectorQuery. Do you want to have a try at adding this?
>>> If so, please feel free to open a PR and I will be happy to guide you.
>>>
>>> On Mon, Nov 7, 2022 at 6:38 AM Alexey Gorlenko <agorlen...@gmail.com>
>>> wrote:
>>> >
>>> > Hi!
>>> >
>>> > There are some use cases where we need to find vectors with the
>>> distance (by some metric) to the given vector V less than the given
>>> threshold T. That task is very similar to the knn problem, but in this case
>>> we don't have a quantity of the nearest neighbours k.
>>> >
>>> > As I see, the current implementation of knn doesn't provide such
>>> functionality. But at the first glance it is not very difficult to modify
>>> the method search of HnswGraph to implement that feature (do not limit
>>> result size and get rid of candidates which exceed threshold).
>>> >
>>> > But maybe that idea has some not obvious problems which I haven't
>>> noticed, and in reality an implementation of that idea would have
>>> fundamental difficulties?
>>> >
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>
>>>
>
> --
> Adrien
>

Reply via email to