[jira] Commented: (LUCENE-1279) RangeQuery and RangeFilter should use collation to check for range inclusion

Grant Ingersoll (JIRA) Sat, 13 Sep 2008 06:49:14 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630766#action_12630766
 ]


Grant Ingersoll commented on LUCENE-1279:
-----------------------------------------

{quote}
Mostly this consisted of switching away from deprecated Hits in tests. 
{quote}

Seems like the new tests in TestRangeFilter still uses Hits.

Also, from the Collator javadocs:
{quote}
When sorting a list of Strings however, it is generally necessary to compare 
each String multiple times. In this case, CollationKeys provide better 
performance. The CollationKey class converts a String to a series of bits that 
can be compared bitwise against other CollationKeys. A CollationKey is created 
by a Collator object for a given String. 
{quote}

I don't think we need to implement this now, but I wonder if there is a 
performance difference if we created the CollationKey for comparison.  The big 
question is whether the construction of that for each term outweighs the 
savings by repeated comparisons to lower and upper.  

One more question, and it probably shows my lack of knowledge here, but would 
it be possible to enumerate the various codepoints where there are problems and 
just handle these separately, somehow?  Basically, how pervasive is the 
problem?  Would we perhaps be better off having a check to see if one of these 
bad codepoints falls in the range of lower/upper and then handle it separately? 
 Or, perhaps, some reasoning  would allow us to better narrow in on the 
lowerTerm/upper instead of having to check the whole field.  Just thinking out 
loud...

Otherwise, looks pretty good.

> RangeQuery and RangeFilter should use collation to check for range inclusion
> ----------------------------------------------------------------------------
>
>                 Key: LUCENE-1279
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1279
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.3.1
>            Reporter: Steven Rowe
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 2.4
>
>         Attachments: LUCENE-1279.patch, LUCENE-1279.patch, LUCENE-1279.patch
>
>
> See [this java-user 
> discussion|http://www.nabble.com/lucene-farsi-problem-td16977096.html] of 
> problems caused by Unicode code-point comparison, instead of collation, in 
> RangeQuery.
> RangeQuery could take in a Locale via a setter, which could be used with a 
> java.text.Collator and/or CollationKey's, to handle ranges for languages 
> which have alphabet orderings different from those in Unicode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-1279) RangeQuery and RangeFilter should use collation to check for range inclusion

Reply via email to