[ 
https://issues.apache.org/jira/browse/LUCENE-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12646679#action_12646679
 ] 

Steven Rowe commented on LUCENE-1435:
-------------------------------------

bq. And wouldn't there be times when you also want to reverse the encoding? EG 
if you enum all terms for presentation (maybe as part of faceted search for 
example)?

AFAIK, CollationKey generation is a one-way operation.  If the original terms 
are required for presentation, they can be stored, right?

{quote}
Here are some pros of internal-to-indexing:
      [...]
    - Real terms are stored in the index - tools like Luke can look at
      the index and see normal looking terms. Though... I don't have a
      sense of what the encoded term would look like - maybe it's not
      that different from the original in practice?
{quote}

IndexableBinaryStringTools (LUCENE-1434) implements a base-8000h encoding: the 
lower 15 bits of each character have 1-7/8 bytes packed into them.  It's 
radically different from the original byte array, at least in terms of looking 
at it with a text viewer like Luke.  And I don't think CollationKeys themselves 
are intended for human consumption.

{quote}
bq. In the current code base, for range searching on a collated field, every 
single term has to be collated with the search term. This patch allows skipTo 
to function when using collation, potentially providing a significant speedup.

Both the original proposed approach (external-to-indexing) and this
internal-to-indexing approach would solve this, right? Ie, in both
cases the terms have been sorted according to the Collator, but in the
internal-to-indexing case it's the original String term stored in the
terms dict.
{quote}

Perhaps I'm missing something, but o.a.l.index.TermEnum.skipTo(Term) compares 
the target term using String.compareTo(), so regardless of the index term 
dictionary ordering, skipTo() won't necessarily stop at the correct location, 
right?  From TermEnum.java:

{code:java}
  public boolean skipTo(Term target) throws IOException {
     do {
        if (!next())
                return false;
     } while (target.compareTo(term()) > 0);
     return true;
  }
{code}

and here's o.a.l.index.Term.compareTo(Term):

{code:java}
  public final int compareTo(Term other) {
    if (field == other.field)                     // fields are interned
      return text.compareTo(other.text);
    else
      return field.compareTo(other.field);
  }
{code}


> CollationKeyFilter: convert tokens into CollationKeys encoded using 
> IndexableBinaryStringTools
> ----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1435
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1435
>             Project: Lucene - Java
>          Issue Type: New Feature
>    Affects Versions: 2.4
>            Reporter: Steven Rowe
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1435.patch, LUCENE-1435.patch
>
>
> Converts each token into its CollationKey using the provided collator, and 
> then encodes the CollationKey with IndexableBinaryStringTools, to allow it to 
> be stored as an index term.
> This will allow for efficient range searches and Sorts over fields that need 
> collation for proper ordering.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to