[
https://issues.apache.org/jira/browse/LUCENE-9507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mayya Sharipova updated LUCENE-9507:
------------------------------------
Fix Version/s: 8.9
main (9.0)
> Custom order for leaves in DirectoryReader, IndexWriter and searcher
> --------------------------------------------------------------------
>
> Key: LUCENE-9507
> URL: https://issues.apache.org/jira/browse/LUCENE-9507
> Project: Lucene - Core
> Issue Type: New Feature
> Reporter: Jim Ferenczi
> Priority: Minor
> Fix For: main (9.0), 8.9
>
> Time Spent: 5h 50m
> Remaining Estimate: 0h
>
> Now that we're able [to skip documents efficiently when sorting by a numeric
> field|https://issues.apache.org/jira/browse/LUCENE-9280], I was wondering if
> we could optimize sorted queries further by also sorting the leaf readers
> based on the primary sort.
> For time-based indices in Elasticsearch, we've implemented an optimization
> that does that at query time. If the query is sorted by a numeric docvalue
> field, prior to search, we sort the leaves according to the query sort. When
> sorting by timestamp this small optimization can have a big impact since
> early termination can be reached much faster if the sort values in the
> segments don't overlap too much. Applying this optimization at query time is
> challenging , it has the benefit to work on any numeric field sort and order
> but it requires to use a multi-reader that will reorganize the segments. It
> can also be deceptive that after a force merge to 1 segment sorted queries
> may be slower since there is nothing to sort anymore.
> So, another option that I look at is to add the ability to provide a leaf
> order directly in the IndexWriter and DirectoryReader. That could be similar
> to an index sort or even complementary to it since sorting segments based on
> the index sort could also help at query time. For time-based indices that
> cannot afford index sorting but have lots of sorted queries on timestamp,
> forcing the order of segments could speed up sorted queries significantly.
> The advantage of forcing a single leaf sort in the writer/reader is that we
> can also use it to influence the merges by putting the segments with the
> highest value first. That would help with the case of indices that are merged
> to a single segment but would like to keep the sorted queries fast but also
> for the multi-segments case since big segments would have more chance to have
> highest values first too.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]