[ 
https://issues.apache.org/jira/browse/LUCENE-8878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16879908#comment-16879908
 ] 

Haoyu Zhai commented on LUCENE-8878:
------------------------------------

[~rcmuir] If I understand LatLonPointDistanceComparator correctly, `copy` 
method is not optimized, so once we make use of this comparator's inner storage 
(`values` field), we'll always need to incur the full cost (as we'll always 
want to `copy` first to store the values)? And the actual optimization is 
happened before we call `copy` operation, we could make a call to 
`compareBottom` to filter out bad points in a lower cost. So I guess it is not 
necessary to have `values` field to keep the optimization, as `compareBottom` 
is not using `values` anyway?I guess to keep the optimization for 
LatLonPointDistanceComparator, we need to have a `compareBottom` and 
`setBottom` and also related fields, but need not to keep storage of whole sort 
values in the comparator?

Also [~hypothesisx86], I think rather than having comparison logic in 
`SortField`, we could have a comparator class and bind this class with 
`ValueAccessor` to enable easier customization?

> Provide alternative sorting utility from SortField other than FieldComparator
> -----------------------------------------------------------------------------
>
>                 Key: LUCENE-8878
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8878
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/search
>    Affects Versions: 8.1.1
>            Reporter: Tony Xu
>            Priority: Major
>
> The `FieldComparator` has many responsibilities and users get all of them at 
> once. At high level the main functionalities of `FieldComparator` are
>  * Provide LeafFieldComparator
>  * Allocate storage for requested number of hits
>  * Read the values from DocValues/Custom source etc.
>  * Compare two values
> There are two major areas for improvement
>  # The logic of reading values and storing them are coupled.
>  # User need to specify the size in order to create a `FieldComparator` but 
> sometimes the size is unknown upfront.
>  # From `FieldComparator`'s API, one can't reason about thread-safety so it 
> is not suitable for concurrent search.
>  E.g. Can two concurrent thread use the same `FieldComparator` to call 
> `getLeafComparator` for two different segments they are working on? In fact, 
> almost all existing implementations of `FieldComparator` are not thread-safe.
> The proposal is to enhance `SortField` with two APIs
>  # {color:#14892c}int compare(Object v1, Object v2){color} – this is to 
> compare two values from different docs for this field
>  # {color:#14892c}ValueAccessor newValueAccessor(LeafReaderContext 
> leaf){color} – This encapsulate the logic for obtaining the right 
> implementation in order to read the field values.
>  `ValueAccessor` should be accessed in a similar way as `DocValues` to 
> provide the sort value for a document in an advance & read fashion.
> With this API, hopefully we can reduce the memory usage when using 
> `FieldComparator` because the users either store the sort values or at least 
> the slot number besides the storage allocated by `FieldComparator` itself. 
> Ideally, only once copy of the values should be stored.
> The proposed API is also more friendly to concurrent search since it provides 
> the `ValueAccessor` per leaf. Although same `ValueAccessor` can't be shared 
> if there are more than one thread working on the same leaf, at least they can 
> initialize their own `ValueAccessor`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to