[jira] [Comment Edited] (LUCENE-10194) Should IndexWriter buffer KNN vectors on disk?

Julie Tibshirani (Jira) Thu, 07 Jul 2022 11:49:27 -0700


    [ 
https://issues.apache.org/jira/browse/LUCENE-10194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17563919#comment-17563919
 ]


Julie Tibshirani edited comment on LUCENE-10194 at 7/7/22 6:48 PM:
-------------------------------------------------------------------

[~mayya] [~jpountz] can we close this since we've decided to go ahead with 
LUCENE-10592 ?


was (Author: julietibs):
[~mayya] [~jpountz] can we close this since we've decided to go ahead with 
https://issues.apache.org/jira/browse/LUCENE-10592 ?

> Should IndexWriter buffer KNN vectors on disk?
> ----------------------------------------------
>
>                 Key: LUCENE-10194
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10194
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Assignee: Mayya Sharipova
>            Priority: Minor
>          Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> VectorValuesWriter buffers data in memory, like we do for all data structures 
> that are computed on flush. But I wonder if this is the right trade-off.
> The use-case I have in mind is someone trying to load a dataset of vectors in 
> Lucene. Given that HNSW graphs are super expensive to create, we'd ideally 
> load that dataset into a single segment rather than many small segments that 
> then need to be merged together, which in-turn re-creates the HNSW graph.
> Yet buffering vectors in memory is expensive. For instance assuming 256 
> dimensions, each vector consumes 1kB of memory. Should we consider buffering 
> vectors on disk to reduce chances of having to create new segments only 
> because the RAM buffer is full?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (LUCENE-10194) Should IndexWriter buffer KNN vectors on disk?

Reply via email to