[jira] [Updated] (LUCENE-6131) optimize SortingMergePolicy

Robert Muir (JIRA) Mon, 22 Dec 2014 09:09:48 -0800

     [ 
https://issues.apache.org/jira/browse/LUCENE-6131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Robert Muir updated LUCENE-6131:
--------------------------------
    Attachment: LUCENE-6131.patch

patch. sorry for the file movements, it makes it huge. The only actual code 
changes to SortingMergePolicy are in one method, which now looks like this:

{code}
public List<LeafReader> getMergeReaders() throws IOException {
      if (unsortedReaders == null) {
        unsortedReaders = super.getMergeReaders();
        // wrap readers, to be optimal for merge;
        List<LeafReader> wrapped = new ArrayList<>(unsortedReaders.size());
        for (LeafReader leaf : unsortedReaders) {
          if (leaf instanceof SegmentReader) {
            leaf = new MergeReaderWrapper((SegmentReader)leaf);
          }
          wrapped.add(leaf);
        }
        final LeafReader atomicView;
        if (wrapped.size() == 1) {
          atomicView = wrapped.get(0);
        } else {
          final CompositeReader multiReader = new 
MultiReader(wrapped.toArray(new LeafReader[wrapped.size()]));
          atomicView = new SlowCompositeReaderWrapper(multiReader, true);
        }
        docMap = sorter.sort(atomicView);
        sortedView = SortingLeafReader.wrap(atomicView, docMap);
      }
      // a null doc map means that the readers are already sorted
      return docMap == null ? unsortedReaders : 
Collections.singletonList(sortedView);
    }
{code}

> optimize SortingMergePolicy
> ---------------------------
>
>                 Key: LUCENE-6131
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6131
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Robert Muir
>         Attachments: LUCENE-6131.patch
>
>
> This has a number of performance problems today:
> # suboptimal stored fields merging. This is especially the case with high 
> compression. Today this is 7x-64x times slower than it should be.
> # ram stacking: for any docvalues and norms fields, all instances will be 
> loaded in RAM. for any string docvalues fields, all instances of global 
> ordinals will be built, and none of this released until the whole merge is 
> complete.
> We can fix these two problems without completely refactoring LeafReader... we 
> won't get a "bulk byte merge", checksum computation will still be suboptimal, 
> and its not a general solution to "merging with filterreaders" but that stuff 
> can be for another day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (LUCENE-6131) optimize SortingMergePolicy

Reply via email to