[
https://issues.apache.org/jira/browse/LUCENE-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675695#action_12675695
]
Uwe Schindler commented on LUCENE-1541:
---------------------------------------
After thinking one night longer about the whole issue, I suspect, that
non-equidistant precision steps are really needed:
Ning's comment is correct about the number of terms. But if you only index long
values from, e.g., 0L to 10000L, you create a lot of terms for shift values 0
to 14 (because the terms are in this range). For shift values 15 to 63, the
term is always the same constant term. The index' TermEnum so contains not many
additional values (because its only *one* term for *all* documents), only
additional TermDocs entries are created. It's the same like adding one "flag"
term to all documents. This does not use much additional space in index. When
you query a range, these terms are never used, but they do not hurt.
The additional space for the trie terms is generated by higher precision (lower
shift) values. If you index with precision step 4 or 2 instead of a precision
step of 8, you create a lot of *different* terms for the lower shift values.
The constant terms in the higher shifts are still always the same and does not
consume much space.
I will create a small comparison on index size for long values without higher
bits, but I suspect, that index size without lower precision terms reduces
space significant. If this is the case, I do not think the additional
complexity of the API is needed for this low impact. If somebody really wants
to optimize index size so much, he can create a optimized fork of TrieRange in
his project that indexes with non-equidistant precision steps. On the other
hand, I would suggest to use ints/floats instead of longs/doubles, if only
lower precision is needed. In this case, less terms will be created. For floats
in comparision to doubles, the effect will be bigger (because even small
doubles use a lot of hgher precision bits), a non-equidistant precision step
will not help very much here.
> Trie range - make trie range indexing more flexible
> ---------------------------------------------------
>
> Key: LUCENE-1541
> URL: https://issues.apache.org/jira/browse/LUCENE-1541
> Project: Lucene - Java
> Issue Type: Improvement
> Components: contrib/*
> Affects Versions: 2.9
> Reporter: Ning Li
> Assignee: Uwe Schindler
> Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1541.patch, LUCENE-1541.patch
>
>
> In the current trie range implementation, a single precision step is
> specified. With a large precision step (say 8), a value is indexed in fewer
> terms (8) but the number of terms for a range can be large. With a small
> precision step (say 2), the number of terms for a range is smaller but a
> value is indexed in more terms (32).
> We want to add an option that different precision steps can be set for
> different precisions. An expert can use this option to keep the number of
> terms for a range small and at the same time index a value in a small number
> of terms. See the discussion in LUCENE-1470 that results in this issue.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]