[jira] Commented: (LUCENE-1673) Move TrieRange to core

Michael McCandless (JIRA) Mon, 15 Jun 2009 13:38:34 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12719761#action_12719761
 ]


Michael McCandless commented on LUCENE-1673:
--------------------------------------------

OK let's open a new issue for how to best integrate/default SortField
and FieldCache.

bq. Nevertheless, I would like to remove emphasis from NumericUtils (which is 
in realyity a helper class).

+1

bq. For bytes, TrieRange is not very interesting, for shorts, maybe, but I 
would subsume them during indexing as simple integers. You could not speedup 
searching, but limit index size a little bit.

Well, a RangeQuery on a "plain text" byte or short field requires
sneakiness (knowing that you must zero-pad; keeping
document.NumberUtils around); I think it's best if NumericXXX in
Lucene handles all of java's native numeric types.  And you want a
byte[] or short[] out of FieldCache (to not waste RAM having to
upgrade to an int[]).

We can do this under the (a?) new issue too...

bq. The SortField factory is then the only parts really needed in NumericUtils, 
but not really. The parser is a singleton, works for all trie fields and could 
also live somewhere else or nowhere at all, if the Parsers all stay in 
FieldCache.

(Under a new issue, but...) I'm not really a fan of leaving the parser
in FieldCache and expecting a user to "know" to create the SortField
with that parser.  NumericSortField would make it much more consumable
to "direct" Lucene users.

{quote}
bq. Can we rename RangeQuery -> TextRangeQuery (TermRangeQuery), to make it 
clear that its range checking is by Term sort order.

We can do this and deprecate the old one, but I added a note to Javadocs (see 
patch). I would do this outside of this issue.
{quote}

OK.

One benefit of a rename is it's a reminder to users on upgrading to
consider whether they should in fact switch to NumericRangeQuery.

{quote}
bq. How about oal.util.NumericUtils instead of TrieUtils?

That was my first idea, too. What to do with o.a.l.doc.NumberTools 
(deprecate?). And also update contrib/spatial to use NumericUtils instead of 
the copied and not really goo NumberUtils from Solr (Yonik said, it was written 
at a very early stage, and is not effective with UTF-8 encoding and the 
TermEnum posioning with the term prefixes). It would be a index-format change 
for spatial, but as the code was not yet released (in Lucene), the Lucene 
version should not use NumberUtils at all.
{quote}

+1 on both (if we can add byte/short to trie*); we should do this
before 2.9 since we can still change locallucene's format.  Maybe open
a new issue for that, too?  We're forking off new 2.9 issues left and
right here!!

bq. I think, I remove the ShiftAttribute in complete, its really useless. 
Maybe, I add a getShift() method to NumericUtils, that returns the shift value 
of a Token/String. See java-dev mailing with Yonik.

OK

{quote}
bq. Did you think about / decide against making a NumericField (that'd set the 
right tokenStream itself)?

Field is final and so I must extend AbstractField. But some methods of Document 
return Field and not AbstractField.
{quote}

Can we just un-final Field?

{quote}
NumericField would only work for indexing, but when retrieving from index 
(stored fields), it would change to Field.

Maybe we should move this after the index-specific schemas and so on. Or 
document, that it can be only used for indexing.
{quote}

True, but we already have such "challenges" between index vs search
time Document; documenting it it seems fine.

bq. By the way: How do you like the factories in NumericRangeQuery and the 
setValue methods, working like StringBuffer.append() in NumericTokenStream? 
This makes it really easy to index.

I think this is great!  I like that you return NumericTokenStream :)

bq. The only good thing of NumericField would be the possibility to 
automatically disable TF and Norms per default when indexing.

Consumability (good defaults)!  (And also not having to know that you
must go and get a tokenStream from NumericUtils).


> Move TrieRange to core
> ----------------------
>
>                 Key: LUCENE-1673
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1673
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: LUCENE-1673.patch, LUCENE-1673.patch
>
>
> TrieRange was iterated many times and seems stable now (LUCENE-1470, 
> LUCENE-1582, LUCENE-1602). There is lots of user interest, Solr added it to 
> its default FieldTypes (SOLR-940) and if possible I want to move it to core 
> before release of 2.9.
> Before this can be done, there are some things to think about:
> # There are now classes called LongTrieRangeQuery, IntTrieRangeQuery, how 
> should they be called in core? I would suggest to leave it as it is. On the 
> other hand, if this keeps our only numeric query implementation, we could 
> call it LongRangeQuery, IntRangeQuery or NumericRangeQuery (see below, here 
> are problems). Same for the TokenStreams and Filters.
> # Maybe the pairs of classes for indexing and searching should be moved into 
> one class: NumericTokenStream, NumericRangeQuery, NumericRangeFilter. The 
> problem here: ctors must be able to pass int, long, double, float as range 
> parameters. For the end user, mixing these 4 types in one class is hard to 
> handle. If somebody forgets to add a L to a long, it suddenly instantiates a 
> int version of range query, hitting no results and so on. Same with other 
> types. Maybe accept java.lang.Number as parameter (because nullable for 
> half-open bounds) and one enum for the type.
> # TrieUtils move into o.a.l.util? or document or?
> # Move TokenStreams into o.a.l.analysis, ShiftAttribute into 
> o.a.l.analysis.tokenattributes? Somewhere else?
> # If we rename the classes, should Solr stay with Trie (because there are 
> different impls)?
> # Maybe add a subclass of AbstractField, that automatically creates these 
> TokenStreams and omits norms/tf per default for easier addition to Document 
> instances?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-1673) Move TrieRange to core

Reply via email to