benwtrent commented on issue #14758:
URL: https://github.com/apache/lucene/issues/14758#issuecomment-3606499629
> But, this would limit the HNSW graph optimizing to one sort criteria?
I thought index sorting could be done over as many fields as you like. But
sure, I can open a separate issue here (I still think this would help y'alls
issue, as generally, you could argue that could indirectly help things that are
correlatively filtered together).
> is a tiny tiny sliver of the catalog, and filtered KNN search on that
sliver is horrible today.
Is it because exploration of the graph is expensive? With ACORN, we should
only be doing vector ops against things that match a filter.
Then I assume its the cost simply reading in the graph? Or is it because we
aren't applying the filter until we get to the bottom layer? (e.g. what if we
"seeded" the search with some vectors that pass the filter to kick start the
bottom layer search in addition to the nearest entry point?).
My main concern is that completely restructuring how raw vectors are stored
in the flat file to break it from being by field or creating many graphs are
very very big changes that only help a very very particular use case. Again,
these aren't worth blocking these ideas, its just that the ideas are very very
big and may fundamentally change the format. Just trying to think of marginal
changes that can help in the right direction.
Where as if we:
- Just made graph traversal nicer
- Dynamically seed the bottom layer based on filter restrictions
Might give us more bang for the buck.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]