[jira] [Updated] (SOLR-14608) Faster sorting for the /export handler

Joel Bernstein (Jira) Tue, 30 Jun 2020 08:53:57 -0700


     [ 
https://issues.apache.org/jira/browse/SOLR-14608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Joel Bernstein updated SOLR-14608:
----------------------------------
    Description: 
The largest cost of the export handler is the sorting. This ticket will 
implement an improved algorithm for sorting that should greatly improve overall 
throughput for the export handler.

The currently algorithm is as follows:

Collect a bitset of match docs. Iterate over that bitset and materialize the 
top level oridinals for the sort fields in the document and add them to 
priority queue of size 30000. Then export the top 30000 docs, turn off the bits 
in the bit set and iterate again until all docs are sorted and sent. 

There are two performance bottlenecks with this approach:

1) Materializing the top level ordinals adds a huge amount of overhead to the 
sorting process.

2) The size of priority, 30,000 adds significant overhead.

 

 

Sorting algorithm to come shortly...

  was:
The largest cost of the export handler is the sorting. This ticket will 
implement an improved algorithm for sorting that should greatly improve overall 
throughput for the export handler.

The currently algorithm is as follows:

Collect a bitset of match docs. Iterate over that bitset and materialize the 
top level oridinals for the sort fields in the document and add them to 
priority queue of size 30000. Then export the top 30000 docs, turn off the bits 
in the bit set and iterate again until all docs are sorted and sent. 

Sorting algorithm to come shortly...


> Faster sorting for the /export handler
> --------------------------------------
>
>                 Key: SOLR-14608
>                 URL: https://issues.apache.org/jira/browse/SOLR-14608
>             Project: Solr
>          Issue Type: New Feature
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Joel Bernstein
>            Priority: Major
>
> The largest cost of the export handler is the sorting. This ticket will 
> implement an improved algorithm for sorting that should greatly improve 
> overall throughput for the export handler.
> The currently algorithm is as follows:
> Collect a bitset of match docs. Iterate over that bitset and materialize the 
> top level oridinals for the sort fields in the document and add them to 
> priority queue of size 30000. Then export the top 30000 docs, turn off the 
> bits in the bit set and iterate again until all docs are sorted and sent. 
> There are two performance bottlenecks with this approach:
> 1) Materializing the top level ordinals adds a huge amount of overhead to the 
> sorting process.
> 2) The size of priority, 30,000 adds significant overhead.
>  
>  
> Sorting algorithm to come shortly...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-14608) Faster sorting for the /export handler

Reply via email to