[
https://issues.apache.org/jira/browse/SOLR-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13954438#comment-13954438
]
Jack Krupansky commented on SOLR-5936:
--------------------------------------
As part of this cleanup, could somebody volunteer to create a plain-English
summary of exactly what a trie field really is, what good it is, and why we
can't live without them? I've read the code and, okay, there is a sequence of
bit shifts and generation of extra terms, but in plain English, what's the
point?
I'm not asking for a recitation of the actual algorithm(s), but some
intuitively accessible summary. I would note that the typical examples are for
strings with prefixes rather than binary numbers.
See:
http://en.wikipedia.org/wiki/Trie
And, is trie really the best solution for number types? Does it actually have
real value for float and double values?
And I would really like to see some plain, easily readable explanation of
precision step. Again, especially for real numbers.
And how should precision step be used for dates?
I mean, other than assuring sort order, why bother with trie? Or more
specifically, why does a Solr (or Lucene) user need to know that trie is used
for the implementation?
Specifically, for example, does it matter if a field has an evenly distributed
range of numeric values with little repetition vs. numeric codes where there is
a relatively small number of distinct values (e.g., 1-10, or scores of 0-100 or
dates in years between 1970 and 2014) and relatively high cardinality? I mean,
does trie do a uniformly great job for both of these extreme use cases,
including for faceting?
And if trie really is the best approach for numeric fields, why not just do all
of this under the hood instead of polluting the field type names with "trie"?
IOW, rename TrieIntField to IntField, etc.
To me, trie just seems like unnecessary noise to average users.
> Deprecate non-Trie-based numeric (and date) field types in 4.x and remove
> them from 5.0
> ---------------------------------------------------------------------------------------
>
> Key: SOLR-5936
> URL: https://issues.apache.org/jira/browse/SOLR-5936
> Project: Solr
> Issue Type: Task
> Components: Schema and Analysis
> Reporter: Steve Rowe
> Assignee: Steve Rowe
> Priority: Minor
> Fix For: 4.8, 5.0
>
> Attachments: SOLR-5936.branch_4x.patch, SOLR-5936.branch_4x.patch
>
>
> We've been discouraging people from using non-Trie numeric&date field types
> for years, it's time we made it official.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]