The way I think of this is that segmenting the graph will generally
lead to higher recall and higher costs (at query time) for a given set
of HNSW parameters. Indexing costs will tend to be lower for multiple
segmented graphs. I don't think that increased irrelevant docs should
be a concern since after collecting from multiple segments (which by
the way can be done concurrently), the results are merged sorted by
score.

On Thu, Nov 3, 2022 at 2:38 PM MyCoy Z <mycoy.zh...@gmail.com> wrote:
>
> Hi, Lucene Developers:
>
> I'm studying the HNSW source code and have some questions regarding Lucene's 
> multi-segments and HNSW.
>
> First, some of my understanding:
> 1. While creating the index, when two segments are being merged, it could 
> rebuild the HNSW graph based on the docs and vectors in the two segments.
> 2. But while reading the index, each segment's graph is loaded separately.
>     There is no way to merge graphs from multiple segments while reading the 
> index.
> Please let me know if there is any misunderstanding.
>
>
> Since HNSW is a graph, the connections between the nodes could matter a lot.
> I can imagine some pros and cons here.
> 1. By splitting the docs into multiple separate graphs, it could help the 
> diversity by retrieving more docs.
>     For example, if just a single graph, some docs could be too far in the 
> Neighbor list to be retrieved. And one way to mitigate this is, dividing the 
> docs into multiple graphs.
>     It could also help to boost the performance.
>
> 2. However, too many segments could cause other issues.
>     For example, retrieving too many irrelevant docs, especially if there are 
> not so many docs in a segment.
>
>
> So, I think the number of segments and the size of the graphs could have a 
> real impact on the retrieving quality and performance.
> I'm wondering if there is any best practice, e.g. how many docs should be in 
> a single graph?
> Or does anyone have some production experience to share?
>
> Thanks & Regards
> MyCoy

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to