Michael McCandless created LUCENE-5966:
------------------------------------------

             Summary: How to migrate from numeric fields to auto-prefix terms
                 Key: LUCENE-5966
                 URL: https://issues.apache.org/jira/browse/LUCENE-5966
             Project: Lucene - Core
          Issue Type: Improvement
            Reporter: Michael McCandless


In LUCENE-5879 we are adding auto-prefix terms to the default terms dict, which 
is generalized from numeric fields and offers faster performance while using 
less indexing space and about the same indexing time.

But there are many users out there with indices already created containing 
numeric fields ... so ideally we have some simple way for such users to switch 
over to auto-prefix  terms.

Robert has a good plan (copied from LUCENE-5879):

Here are some thoughts.
# keep current trie "Encoding" for terms, it just uses precision step=Inf and 
lets the term dictionary do it automatically.
# create a filteratomicreader, that for a previous trie encoded field, removes 
"fake" terms on merge.

Users could continue to use NumericRangeQuery just with the infinite precision 
step, and it will always work, just execute slower for old segments as it 
doesnt take advantage of the trie terms that are not yet merged away.

One issue to making it really nice, is that lucene doesnt know for sure that a 
field is numeric, so it cannot be "full-auto". Apps would have to use their 
schema or whatever to wrap with this reader in their merge policy.

Maybe we could provide some sugar for this, such as a wrapping merge policy 
that takes a list of field names that are numeric, or sugar to pass this to IWC 
in IndexUpgrader to force it, and so on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to