[jira] [Commented] (LUCENE-10592) Should we build HNSW graph on the fly during indexing

ASF subversion and git services (Jira) Fri, 22 Jul 2022 08:30:17 -0700


    [ 
https://issues.apache.org/jira/browse/LUCENE-10592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17570081#comment-17570081
 ]


ASF subversion and git services commented on LUCENE-10592:
----------------------------------------------------------

Commit ba4bc0427146669ffd1c41fc0151db33e5a5be33 in lucene's branch 
refs/heads/main from Mayya Sharipova
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=ba4bc042714 ]

LUCENE-10592 Build HNSW Graph on indexing (#992)

Currently, when indexing knn vectors, we buffer them in memory and
on flush during a segment construction we build an HNSW graph.
As building an HNSW graph is very expensive, this makes flush
operation take a lot of time. This also makes overall indexing
performance quite unpredictable – some indexing operations return
almost instantly while others that trigger flush take a lot of time.
This happens because flushes are unpredictable and trigged
by memory used, presence of concurrent searches etc.

Building an HNSW graph as we index documents avoid these problems,
as the load of HNSW graph construction is spread evenly during indexing.

Co-authored-by: Adrien Grand <jpou...@gmail.com>

> Should we build HNSW graph on the fly during indexing
> -----------------------------------------------------
>
>                 Key: LUCENE-10592
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10592
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Mayya Sharipova
>            Assignee: Mayya Sharipova
>            Priority: Minor
>          Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> Currently, when we index vectors for KnnVectorField, we buffer those vectors 
> in memory and on flush during a segment construction we build an HNSW graph.  
> As building an HNSW graph is very expensive, this makes flush operation take 
> a lot of time. This also makes overall indexing performance quite 
> unpredictable (as the number of flushes are defined by memory used, and the 
> presence of concurrent searches), e.g. some indexing operations return almost 
> instantly while others that trigger flush take a lot of time. 
> Building an HNSW graph on the fly as we index vectors allows to avoid this 
> problem, and spread a load of HNSW graph construction evenly during indexing.
> This will also supersede LUCENE-10194



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10592) Should we build HNSW graph on the fly during indexing

Reply via email to