[
https://issues.apache.org/jira/browse/LUCENE-7071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless updated LUCENE-7071:
---------------------------------------
Attachment: LUCENE-7071.patch
Patch, avoiding copying bytes when the referenced slice already lies within a
single {{byte[]}} block from the pool. The new APIs are a bit ugly looking,
however, they are package private, so I think it's OK? I also can't think of
any cleaner way to pack bytes in so nothing is "wasted", yet avoid copying
bytes in the common case.
I also stumbled upon and fixed some pre-existing "ignore {{BytesRef.offset}}"
bugs in suggest's {{SortedInputIterator}}.
This gives a ~10% speedup on the time it takes to merge all ~61M 2D lat/lon
points in the London, UK benchmark.
> Can we reeduce excessive byte[] copying in OfflineSorter?
> ---------------------------------------------------------
>
> Key: LUCENE-7071
> URL: https://issues.apache.org/jira/browse/LUCENE-7071
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Fix For: master, 6.1
>
> Attachments: LUCENE-7071.patch
>
>
> OfflineSorter, which dimensional points uses heavily in the > 1D case,
> works by reading one partition, a set of N unsorted values, from disk
> and sorting it in memory and writing it out again.
> The sort invokes a provided {{Comparator}} on two {{BytesRef}} values,
> each of which is fully copied from the {{ByteBlockPool}}, when it could
> often reference a slice from the pool instead.
> Another byte[] copy happens when iterating through the sorted values.
> This is an optimization ... I'm targeting 6.1.0 not 6.0.0!
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]