[jira] [Commented] (LUCENE-10194) Should IndexWriter buffer KNN vectors on disk?

Mayya Sharipova (Jira) Thu, 07 Jul 2022 12:05:05 -0700


    [ 
https://issues.apache.org/jira/browse/LUCENE-10194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17563924#comment-17563924
 ]


Mayya Sharipova commented on LUCENE-10194:
------------------------------------------

+ 1 for closing.

I've closed the corresponding PR as well.

> Should IndexWriter buffer KNN vectors on disk?
> ----------------------------------------------
>
>                 Key: LUCENE-10194
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10194
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Assignee: Mayya Sharipova
>            Priority: Minor
>          Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> VectorValuesWriter buffers data in memory, like we do for all data structures 
> that are computed on flush. But I wonder if this is the right trade-off.
> The use-case I have in mind is someone trying to load a dataset of vectors in 
> Lucene. Given that HNSW graphs are super expensive to create, we'd ideally 
> load that dataset into a single segment rather than many small segments that 
> then need to be merged together, which in-turn re-creates the HNSW graph.
> Yet buffering vectors in memory is expensive. For instance assuming 256 
> dimensions, each vector consumes 1kB of memory. Should we consider buffering 
> vectors on disk to reduce chances of having to create new segments only 
> because the RAM buffer is full?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10194) Should IndexWriter buffer KNN vectors on disk?

Reply via email to