[ https://issues.apache.org/jira/browse/LUCENE-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630766#action_12630766 ]
Grant Ingersoll commented on LUCENE-1279: ----------------------------------------- {quote} Mostly this consisted of switching away from deprecated Hits in tests. {quote} Seems like the new tests in TestRangeFilter still uses Hits. Also, from the Collator javadocs: {quote} When sorting a list of Strings however, it is generally necessary to compare each String multiple times. In this case, CollationKeys provide better performance. The CollationKey class converts a String to a series of bits that can be compared bitwise against other CollationKeys. A CollationKey is created by a Collator object for a given String. {quote} I don't think we need to implement this now, but I wonder if there is a performance difference if we created the CollationKey for comparison. The big question is whether the construction of that for each term outweighs the savings by repeated comparisons to lower and upper. One more question, and it probably shows my lack of knowledge here, but would it be possible to enumerate the various codepoints where there are problems and just handle these separately, somehow? Basically, how pervasive is the problem? Would we perhaps be better off having a check to see if one of these bad codepoints falls in the range of lower/upper and then handle it separately? Or, perhaps, some reasoning would allow us to better narrow in on the lowerTerm/upper instead of having to check the whole field. Just thinking out loud... Otherwise, looks pretty good. > RangeQuery and RangeFilter should use collation to check for range inclusion > ---------------------------------------------------------------------------- > > Key: LUCENE-1279 > URL: https://issues.apache.org/jira/browse/LUCENE-1279 > Project: Lucene - Java > Issue Type: Improvement > Components: Search > Affects Versions: 2.3.1 > Reporter: Steven Rowe > Assignee: Grant Ingersoll > Priority: Minor > Fix For: 2.4 > > Attachments: LUCENE-1279.patch, LUCENE-1279.patch, LUCENE-1279.patch > > > See [this java-user > discussion|http://www.nabble.com/lucene-farsi-problem-td16977096.html] of > problems caused by Unicode code-point comparison, instead of collation, in > RangeQuery. > RangeQuery could take in a Locale via a setter, which could be used with a > java.text.Collator and/or CollationKey's, to handle ranges for languages > which have alphabet orderings different from those in Unicode. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]