[jira] [Commented] (LUCENE-9148) Move the BKD index to its own file.

Michael McCandless (Jira) Wed, 06 May 2020 05:08:42 -0700


    [ 
https://issues.apache.org/jira/browse/LUCENE-9148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17100741#comment-17100741
 ]


Michael McCandless commented on LUCENE-9148:
--------------------------------------------

{quote}Not one file per field, it would be horrible. :)
{quote}
OK phew :)
{quote}The motivation for splitting the index and data files is that they have 
different access patterns. For instance finding nearest neighbors is pretty 
intense on the index, and I believe some users might want to keep it in RAM so 
having it in a different file from the data file will help users leverage 
MmapDirectory#setPreload and FileSwitchDirectory to do so.
{quote}
This sounds great – the better locality should also be a performance win even 
if you do not explicitly warm, e.g. using {{setPreload}}.
{quote}It would be nice to consider LUCENE-9291 for this change. 
{quote}
Maybe keep these two changes separate if possible?  It is usually best to 
strongly separate rote refactoring (no functionality changed) from changes in 
functionality.  Or, perhaps this change could better inform specifically what 
approach/interfaces we would want in LUCENE-9291.

> Move the BKD index to its own file.
> -----------------------------------
>
>                 Key: LUCENE-9148
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9148
>             Project: Lucene - Core
>          Issue Type: Task
>            Reporter: Adrien Grand
>            Priority: Minor
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Lucene60PointsWriter stores both inner nodes and leaf nodes in the same file, 
> interleaved. For instance if you have two fields, you would have 
> {{<leaf_nodes_A, inner_nodes_A, leaf_nodes_B, inner_nodes_B>}}. It's not 
> ideal since leaves and inner nodes have quite different access patterns. 
> Should we split this into two files? In the case when the BKD index is 
> off-heap, this would also help force it into RAM with 
> {{MMapDirectory#setPreload}}.
> Note that Lucene60PointsFormat already has a file that it calls "index" but 
> it's really only about mapping fields to file pointers in the other file and 
> not what I'm discussing here. But we could possibly store the BKD indices in 
> this existing file if we want to avoid creating a new one.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9148) Move the BKD index to its own file.

Reply via email to