On 07/30/2014 11:31 AM, Michael McCandless wrote:
If you indexed with precStep=Integer.MAX_VALUE then your index should
only have the "shift 0" terms, I think.
Can you boil the exception case (values 1 to 2500) down to a small test case?
The issue was the copyChars of the BytesRef class
/**
* Copies the UTF8 bytes for this string.
*
* @paramtextMust be well-formed unicode text, with no
* unpaired surrogates or invalid UTF16 code units.
*/
public voidcopyChars(CharSequence text) {
assertoffset==0;// TODO broken if offset != 0
UnicodeUtil.UTF16toUTF8(text,0,text.length(), this);
}
As it says in the code, it is broken when offset != 0.
Except that the assertion didn't work for me (I'm using Lucene from
Kotlin, there may be some inter-op issues as Kotlin is still cooking)
In Kotlin, I was reusing the BytesRef that was given to me by
termsEnum?.next()
to convert the BytesRef for the prefixed int into a BytesRef
representation of the int :
by doing
while (true) {
BytesRef ref = termsEnum.next()
if (ref == null) break
int shift = ref.bytes[ref.offset] -
NumericUtils.SHIFT_START_INT
if (shift > 31 || shift < 0) {} else {
ref.copyChars(NumericUtils.prefixCodedToInt(ref).toString())
}
}
Still, it would be nice if the copyChars method was fixed or at least if
the warning about copyChars being broken for offset != 0 was not only in
the code but also in the javaDoc.
Best regards,
Olivier
Mike McCandless
http://blog.mikemccandless.com
On Wed, Jul 30, 2014 at 2:21 AM, Olivier Binda <olivier.bi...@wanadoo.fr> wrote:
Hello.
How do you get the terms for a TermsEnum of an IntField coded with
precisionStc = Integer.Max that you get with
MultiFields.getTerms(reader, intField).iterator(null) ?
I had mitigated success trying to get the terms out of this iterator with
NumericUtils.prefixCodedToInt
I tried
while (true) {
BytesRef ref = termsEnum?.next()
if (ref == null) break
int value = NumericUtils.prefixCodedToInt(ref)
}
But it doesn't work (reliably) because of the trie structure I guess
In an IntField with values 1,2,3,4,5 it worked
But in an Int Field with all values from 1 to 2500, I got exceptions :
lots of shifts aren't in the 0..31 range and it looks like there are
"Blocks" with :
first a term with shift 0 and value n
followed by lots of terms with shift that aren't in 0..31 but who share the
same prefix...
Best regards,
Olivier
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org