The way I think of this is that segmenting the graph will generally lead to higher recall and higher costs (at query time) for a given set of HNSW parameters. Indexing costs will tend to be lower for multiple segmented graphs. I don't think that increased irrelevant docs should be a concern since after collecting from multiple segments (which by the way can be done concurrently), the results are merged sorted by score.
On Thu, Nov 3, 2022 at 2:38 PM MyCoy Z <mycoy.zh...@gmail.com> wrote: > > Hi, Lucene Developers: > > I'm studying the HNSW source code and have some questions regarding Lucene's > multi-segments and HNSW. > > First, some of my understanding: > 1. While creating the index, when two segments are being merged, it could > rebuild the HNSW graph based on the docs and vectors in the two segments. > 2. But while reading the index, each segment's graph is loaded separately. > There is no way to merge graphs from multiple segments while reading the > index. > Please let me know if there is any misunderstanding. > > > Since HNSW is a graph, the connections between the nodes could matter a lot. > I can imagine some pros and cons here. > 1. By splitting the docs into multiple separate graphs, it could help the > diversity by retrieving more docs. > For example, if just a single graph, some docs could be too far in the > Neighbor list to be retrieved. And one way to mitigate this is, dividing the > docs into multiple graphs. > It could also help to boost the performance. > > 2. However, too many segments could cause other issues. > For example, retrieving too many irrelevant docs, especially if there are > not so many docs in a segment. > > > So, I think the number of segments and the size of the graphs could have a > real impact on the retrieving quality and performance. > I'm wondering if there is any best practice, e.g. how many docs should be in > a single graph? > Or does anyone have some production experience to share? > > Thanks & Regards > MyCoy --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org