[
https://issues.apache.org/jira/browse/LUCENE-10194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mayya Sharipova closed LUCENE-10194.
------------------------------------
> Should IndexWriter buffer KNN vectors on disk?
> ----------------------------------------------
>
> Key: LUCENE-10194
> URL: https://issues.apache.org/jira/browse/LUCENE-10194
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Adrien Grand
> Assignee: Mayya Sharipova
> Priority: Minor
> Time Spent: 4.5h
> Remaining Estimate: 0h
>
> VectorValuesWriter buffers data in memory, like we do for all data structures
> that are computed on flush. But I wonder if this is the right trade-off.
> The use-case I have in mind is someone trying to load a dataset of vectors in
> Lucene. Given that HNSW graphs are super expensive to create, we'd ideally
> load that dataset into a single segment rather than many small segments that
> then need to be merged together, which in-turn re-creates the HNSW graph.
> Yet buffering vectors in memory is expensive. For instance assuming 256
> dimensions, each vector consumes 1kB of memory. Should we consider buffering
> vectors on disk to reduce chances of having to create new segments only
> because the RAM buffer is full?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]