[
https://issues.apache.org/jira/browse/LUCENE-10194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17563919#comment-17563919
]
Julie Tibshirani edited comment on LUCENE-10194 at 7/7/22 6:48 PM:
-------------------------------------------------------------------
[~mayya] [~jpountz] can we close this since we've decided to go ahead with
LUCENE-10592 ?
was (Author: julietibs):
[~mayya] [~jpountz] can we close this since we've decided to go ahead with
https://issues.apache.org/jira/browse/LUCENE-10592 ?
> Should IndexWriter buffer KNN vectors on disk?
> ----------------------------------------------
>
> Key: LUCENE-10194
> URL: https://issues.apache.org/jira/browse/LUCENE-10194
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Adrien Grand
> Assignee: Mayya Sharipova
> Priority: Minor
> Time Spent: 4h 10m
> Remaining Estimate: 0h
>
> VectorValuesWriter buffers data in memory, like we do for all data structures
> that are computed on flush. But I wonder if this is the right trade-off.
> The use-case I have in mind is someone trying to load a dataset of vectors in
> Lucene. Given that HNSW graphs are super expensive to create, we'd ideally
> load that dataset into a single segment rather than many small segments that
> then need to be merged together, which in-turn re-creates the HNSW graph.
> Yet buffering vectors in memory is expensive. For instance assuming 256
> dimensions, each vector consumes 1kB of memory. Should we consider buffering
> vectors on disk to reduce chances of having to create new segments only
> because the RAM buffer is full?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]