[ 
https://issues.apache.org/jira/browse/LUCENE-7396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-7396:
---------------------------------
    Attachment: LUCENE-7396.patch

Here is a patch that demonstrates the idea. PointValuesWriter sorts before 
feeding the PointsFormat's writer and Lucene60PointsWriter.addField consumes 
the PointsReader twice in the one dimension case: once to detect whether it is 
sorted, and once to write values similarly to the way merging works. In spite 
of the fact that the reader is consumed twice, I saw better indexing 
performance of IndexAndSearchOpenStreetMaps1D: 3 runs on master gave me 74, 78 
and 75 seconds while 3 runs with this patch gave me 65, 67 and 67 seconds.

It would be nice if somehow we could propagate the information that the reader 
is sorted rather than having to iterate it twice?

> Speed up flush of 1-dimension points
> ------------------------------------
>
>                 Key: LUCENE-7396
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7396
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-7396.patch
>
>
> 1D points already have an optimized merge implementation which works when 
> points come in order. So maybe we could make IndexWriter's PointValuesWriter 
> sort before feeding the PointsFormat and somehow propagate the information to 
> the PointsFormat?
> The benefit is that flushing could directly stream points to disk with little 
> memory usage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to