[
https://issues.apache.org/jira/browse/LUCENE-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12646619#action_12646619
]
Steven Rowe commented on LUCENE-1435:
-------------------------------------
Hi Mike,
bq.Could we, alternatively, push this change into DocumentsWriter, such that on
writing a segment it uses a per-field Collator (FieldInfo would be extended to
record this) to sort the terms dict?
Are you suggesting to not store collation keys in the index?
bq. I haven't fully thought through the tradeoffs... but it seems like this'd
be simpler to use? Ie rather than putting a CollationKeyFilter in your analyzer
chain, and then doing the reverse of this for all searches at search time, you
simply set the Collator on the fields (at indexing & searching time, since I
agree we should for now not try to serialize into the index which field has
which Collator)?
The query-time process in this patch is not the reverse - it is exactly the
same. The String-encoded collation keys stored in the index are compared
directly with those from query terms. Neither the String-encoding nor the
CollationKey needs to be reversed.
bq. I guess there is a performance cost to using the Collator to do live binary
search (during searching) and sorting (during indexing) vs doing unicode String
comparisions but in practice at search time this is probably a tiny part of the
net cost of searching?
In the current code base, for range searching on a collated field, every single
term has to be collated with the search term. This patch allows skipTo to
function when using collation, potentially providing a significant speedup.
> CollationKeyFilter: convert tokens into CollationKeys encoded using
> IndexableBinaryStringTools
> ----------------------------------------------------------------------------------------------
>
> Key: LUCENE-1435
> URL: https://issues.apache.org/jira/browse/LUCENE-1435
> Project: Lucene - Java
> Issue Type: New Feature
> Affects Versions: 2.4
> Reporter: Steven Rowe
> Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1435.patch, LUCENE-1435.patch
>
>
> Converts each token into its CollationKey using the provided collator, and
> then encodes the CollationKey with IndexableBinaryStringTools, to allow it to
> be stored as an index term.
> This will allow for efficient range searches and Sorts over fields that need
> collation for proper ordering.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]