[
https://issues.apache.org/jira/browse/LUCENE-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ignacio Vera reassigned LUCENE-8623:
------------------------------------
Assignee: Ignacio Vera
> Decrease I/O pressure when merging high dimensional points
> ----------------------------------------------------------
>
> Key: LUCENE-8623
> URL: https://issues.apache.org/jira/browse/LUCENE-8623
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Ignacio Vera
> Assignee: Ignacio Vera
> Priority: Major
> Attachments: Geo3D.png, LUCENE-8623.patch, LUCENE-8623.patch,
> LUCENE-8623.patch, LUCENE-8623.patch, LatLonPoint.png, LatLonShape.png
>
>
> Related with LUCENE-8619, after indexing 60 million shapes(~1.65 billion
> triangles) using {{LatLonShape}}, the index directory grew to a size of 265
> GB when performing merging of different segments. After the processes were
> over the index size was 57 GB.
> As an example imagine we are merging several segments to a new segment of
> size 10GB (4 dimensions). The BKD tree merging logic will create the
> following files:
> 1) Level 0: 4 copies of the data, each one sorted by one dimensions : 40GB
> 2) Level 1: 6 copies of half of the data, left and right : 30GB
> 3) Level 2: 6 copies of one quarter of the data, left and right : 15 GB
> 4) Level 3: 6 more copies halving the previous level, left and right : 7.5 GB
> 5) Level 4: 6 more copies halving the previous level, left and right : 3.75 GB
>
> and so on... So it requires around 100GB to merge that segment.
> In this issue is proposed to delay the creation of sorted copies to when they
> are needed. It reduces the total size required to half of what it is needed
> now.
>
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]