[
https://issues.apache.org/jira/browse/LUCENE-8878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tony Xu updated LUCENE-8878:
----------------------------
Description:
The `FieldComparator` has many responsibilities and users get all of them at
once. At high level the main functionalities of `FieldComparator` are
* Provide LeafFieldComparator
* Allocate storage for requested number of hits
* Read the values from DocValues/Custom source etc.
* Compare two values
There are two major areas for improvement
# The logic of reading values and storing them are coupled.
# User need to specify the size in order to create a `FieldComparator` but
sometimes the size is unknown upfront.
# From `FieldComparator`'s API, one can't reason about thread-safety so it is
not suitable for concurrent search.
E.g. Can two concurrent thread use the same `FieldComparator` to call
`getLeafComparator` for two different segments they are working on? In fact,
almost all existing implementations of `FieldComparator` are not thread-safe.
The proposal is to enhance `SortField` with two APIs
# {color:#14892c}int compare(Object v1, Object v2){color} – this is to compare
two values from different docs for this field
# {color:#14892c}ValueAccessor newValueAccessor(LeafReaderContext leaf){color}
– This encapsulate the logic for obtaining the right implementation in order to
read the field values.
`ValueAccessor` should be accessed in a similar way as `DocValues` to provide
the sort value for a document in an advance & read fashion.
With this API, hopefully we can reduce the memory usage when using
`FieldComparator` because the users either store the sort values or at least
the slot number besides the storage allocated by `FieldComparator` itself.
Ideally, only once copy of the values should be stored.
The proposed API is also more friendly to concurrent search since it provides
the `ValueAccessor` per leaf. Although same `ValueAccessor` can't be shared if
there are more than one thread working on the same leaf, at least they can
initialize their own `ValueAccessor`.
was:
The `FieldComparator` has many responsibilities and users get all of them at
once. At high level the main functionalities of `FieldComparator` are
* Manage LeafFieldComparator
* Allocate storage for requested number of hits
* Read the values from DocValues/Custom source etc.
* Compare two values
There are two major areas for improvement
# 1. The logic of reading values and storing them are coupled.
# 2. From `FieldComparator`'s API, one can't reason about thread-safety so it
is not suitable for concurrent search.
E.g. Can two concurrent thread use the same `FieldComparator` to call
`getLeafComparator` for two different segments they are working on? In fact,
almost all existing implementations of `FieldComparator` are not thread-safe.
The proposal is to enhance `SortField` with two APIs
#1. int compare(Object v1, Object v2) -- this is to compare two values from
different docs for this field
#2. ValueAccessor newValueAccessor(LeafReaderContext leaf) -- This encapsulate
the logic for obtaining the right implementation in order to read the field
values.
`ValueAccessor` should be accessed in a similar way as `DocValues` to provide
the sort value for a document in an advance & read fashion.
With this API, hopefully we can reduce the memory usage when using
`FieldComparator` because the users either store the sort values or at least
the slot number besides the storage allocated by `FieldComparator` itself.
Ideally, only once copy of the values should be stored.
The proposed API is also more friendly to concurrent search since it provides
the `ValueAccessor` per leaf. Although same `ValueAccessor` can't be shared if
there are more than one thread working on the same leaf, at least they can
initialize their own `ValueAccessor`.
> Provide alternative sorting utility from SortField other than FieldComparator
> -----------------------------------------------------------------------------
>
> Key: LUCENE-8878
> URL: https://issues.apache.org/jira/browse/LUCENE-8878
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/search
> Affects Versions: 8.1.1
> Reporter: Tony Xu
> Priority: Major
>
> The `FieldComparator` has many responsibilities and users get all of them at
> once. At high level the main functionalities of `FieldComparator` are
> * Provide LeafFieldComparator
> * Allocate storage for requested number of hits
> * Read the values from DocValues/Custom source etc.
> * Compare two values
> There are two major areas for improvement
> # The logic of reading values and storing them are coupled.
> # User need to specify the size in order to create a `FieldComparator` but
> sometimes the size is unknown upfront.
> # From `FieldComparator`'s API, one can't reason about thread-safety so it
> is not suitable for concurrent search.
> E.g. Can two concurrent thread use the same `FieldComparator` to call
> `getLeafComparator` for two different segments they are working on? In fact,
> almost all existing implementations of `FieldComparator` are not thread-safe.
> The proposal is to enhance `SortField` with two APIs
> # {color:#14892c}int compare(Object v1, Object v2){color} – this is to
> compare two values from different docs for this field
> # {color:#14892c}ValueAccessor newValueAccessor(LeafReaderContext
> leaf){color} – This encapsulate the logic for obtaining the right
> implementation in order to read the field values.
> `ValueAccessor` should be accessed in a similar way as `DocValues` to
> provide the sort value for a document in an advance & read fashion.
> With this API, hopefully we can reduce the memory usage when using
> `FieldComparator` because the users either store the sort values or at least
> the slot number besides the storage allocated by `FieldComparator` itself.
> Ideally, only once copy of the values should be stored.
> The proposed API is also more friendly to concurrent search since it provides
> the `ValueAccessor` per leaf. Although same `ValueAccessor` can't be shared
> if there are more than one thread working on the same leaf, at least they can
> initialize their own `ValueAccessor`.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]