[
https://issues.apache.org/jira/browse/LUCENE-6131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Muir updated LUCENE-6131:
--------------------------------
Attachment: LUCENE-6131.patch
patch. sorry for the file movements, it makes it huge. The only actual code
changes to SortingMergePolicy are in one method, which now looks like this:
{code}
public List<LeafReader> getMergeReaders() throws IOException {
if (unsortedReaders == null) {
unsortedReaders = super.getMergeReaders();
// wrap readers, to be optimal for merge;
List<LeafReader> wrapped = new ArrayList<>(unsortedReaders.size());
for (LeafReader leaf : unsortedReaders) {
if (leaf instanceof SegmentReader) {
leaf = new MergeReaderWrapper((SegmentReader)leaf);
}
wrapped.add(leaf);
}
final LeafReader atomicView;
if (wrapped.size() == 1) {
atomicView = wrapped.get(0);
} else {
final CompositeReader multiReader = new
MultiReader(wrapped.toArray(new LeafReader[wrapped.size()]));
atomicView = new SlowCompositeReaderWrapper(multiReader, true);
}
docMap = sorter.sort(atomicView);
sortedView = SortingLeafReader.wrap(atomicView, docMap);
}
// a null doc map means that the readers are already sorted
return docMap == null ? unsortedReaders :
Collections.singletonList(sortedView);
}
{code}
> optimize SortingMergePolicy
> ---------------------------
>
> Key: LUCENE-6131
> URL: https://issues.apache.org/jira/browse/LUCENE-6131
> Project: Lucene - Core
> Issue Type: Bug
> Reporter: Robert Muir
> Attachments: LUCENE-6131.patch
>
>
> This has a number of performance problems today:
> # suboptimal stored fields merging. This is especially the case with high
> compression. Today this is 7x-64x times slower than it should be.
> # ram stacking: for any docvalues and norms fields, all instances will be
> loaded in RAM. for any string docvalues fields, all instances of global
> ordinals will be built, and none of this released until the whole merge is
> complete.
> We can fix these two problems without completely refactoring LeafReader... we
> won't get a "bulk byte merge", checksum computation will still be suboptimal,
> and its not a general solution to "merging with filterreaders" but that stuff
> can be for another day.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]