[
https://issues.apache.org/jira/browse/LUCENE-6131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14256241#comment-14256241
]
Robert Muir commented on LUCENE-6131:
-------------------------------------
I ran quick benchmarks, indexing 1M docs log data and sorting by timestamp. I
used 10k doc segments/logdocMP/serial MS. all fields were indexed and stored,
and I enabled DV on timestamp:
||compression||no sorting||sort (trunk)||sort (patch)||
|BEST_SPEED|37,552ms|56,095ms|46,309ms|
|BEST_COMPRESSION|39,132ms|206,068ms|47,395ms|
So I think it solves the worst of the worst and we can move forward from here?
Another thing that seems not to work is the "already sorted" optimization. For
this test it should be kicking in? We should look at that in another issue.
> optimize SortingMergePolicy
> ---------------------------
>
> Key: LUCENE-6131
> URL: https://issues.apache.org/jira/browse/LUCENE-6131
> Project: Lucene - Core
> Issue Type: Bug
> Reporter: Robert Muir
> Attachments: LUCENE-6131.patch
>
>
> This has a number of performance problems today:
> # suboptimal stored fields merging. This is especially the case with high
> compression. Today this is 7x-64x times slower than it should be.
> # ram stacking: for any docvalues and norms fields, all instances will be
> loaded in RAM. for any string docvalues fields, all instances of global
> ordinals will be built, and none of this released until the whole merge is
> complete.
> We can fix these two problems without completely refactoring LeafReader... we
> won't get a "bulk byte merge", checksum computation will still be suboptimal,
> and its not a general solution to "merging with filterreaders" but that stuff
> can be for another day.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]