[
https://issues.apache.org/jira/browse/LUCENE-7396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Adrien Grand updated LUCENE-7396:
---------------------------------
Attachment: LUCENE-7396.patch
I looked into the slow down with Mike. The radix sort I was using in the 1D
case has a fallback to quicksort when the range gets more narrow, which was
pretty slow since it would call ByteBlockPool.readBytes for every single byte,
it should be better now. I suspect I did not hit the problem with
IndexAndSearchOpenStreetMaps1D because most of the time was spent on the first
levels of recursion.
The 2D case also got improved by using a radix select when recursively building
the bkd tree instead of quickselect. The tie breaking on doc ids got improved
by only looking at relevant bytes since we can know the number of required bits
up-front thanks to maxDoc. And IW does not pre-budget ords anymore.
I got the following IW logs when running IndexTaxis and
IndexAndSearchOpenStreetMaps:
{noformat}
master IndexAndSearchOpenStreetMaps, rambuffer=128MB
IW 0 [2016-07-27T15:38:21.308Z; Thread-0]: 17525 msec to write points
IW 0 [2016-07-27T15:38:44.657Z; Thread-0]: 16746 msec to write points
IW 0 [2016-07-27T15:39:08.278Z; Thread-0]: 16982 msec to write points
IW 0 [2016-07-27T15:39:32.613Z; Thread-0]: 17568 msec to write points
IW 0 [2016-07-27T15:39:56.056Z; Thread-0]: 16684 msec to write points
IW 0 [2016-07-27T15:40:06.646Z; main]: 7324 msec to write points
master IndexTaxis, first 4 flushes
IW 0 [2016-07-27T15:42:10.401Z; Thread-0]: 34422 msec to write points
IW 0 [2016-07-27T15:43:15.561Z; Thread-0]: 32306 msec to write points
IW 0 [2016-07-27T15:44:20.702Z; Thread-0]: 31753 msec to write points
IW 0 [2016-07-27T15:45:24.920Z; Thread-0]: 32340 msec to write points
patch IndexAndSearchOpenStreetMaps, ramBuffer=128MB
IW 0 [2016-07-27T15:55:49.959Z; Thread-0]: 10581 msec to write points
IW 0 [2016-07-27T15:56:08.098Z; Thread-0]: 10306 msec to write points
IW 0 [2016-07-27T15:56:25.445Z; Thread-0]: 10226 msec to write points
IW 0 [2016-07-27T15:56:42.513Z; Thread-0]: 10308 msec to write points
IW 0 [2016-07-27T15:56:59.898Z; Thread-0]: 10162 msec to write points
IW 0 [2016-07-27T15:57:08.497Z; main]: 4593 msec to write points
patch IndexTaxis, first 4 flushes
IW 0 [2016-07-27T15:47:10.906Z; Thread-0]: 25673 msec to write points
IW 0 [2016-07-27T15:48:06.356Z; Thread-0]: 23615 msec to write points
IW 0 [2016-07-27T15:49:03.327Z; Thread-0]: 23915 msec to write points
IW 0 [2016-07-27T15:49:59.424Z; Thread-0]: 23482 msec to write points
{noformat}
> Speed up flush of 1-dimension points
> ------------------------------------
>
> Key: LUCENE-7396
> URL: https://issues.apache.org/jira/browse/LUCENE-7396
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Adrien Grand
> Priority: Minor
> Attachments: LUCENE-7396.patch, LUCENE-7396.patch, LUCENE-7396.patch
>
>
> 1D points already have an optimized merge implementation which works when
> points come in order. So maybe we could make IndexWriter's PointValuesWriter
> sort before feeding the PointsFormat and somehow propagate the information to
> the PointsFormat?
> The benefit is that flushing could directly stream points to disk with little
> memory usage.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]