On 07/30/2014 11:31 AM, Michael McCandless wrote:
If you indexed with precStep=Integer.MAX_VALUE then your index should
only have the "shift 0" terms, I think.

Can you boil the exception case (values 1 to 2500) down to a small test case?

The issue was the copyChars of the BytesRef class

/**
 * Copies the UTF8 bytes for this string.
 *
 * @paramtextMust be well-formed unicode text, with no
 * unpaired surrogates or invalid UTF16 code units.
 */
public voidcopyChars(CharSequence text) {
  assertoffset==0;// TODO broken if offset != 0
  UnicodeUtil.UTF16toUTF8(text,0,text.length(), this);
}


As it says in the code, it is broken when offset != 0.
Except that the assertion didn't work for me (I'm using Lucene from Kotlin, there may be some inter-op issues as Kotlin is still cooking)

In Kotlin, I was reusing the BytesRef that was given to me by termsEnum?.next() to convert the BytesRef for the prefixed int into a BytesRef representation of the int :
by doing

while (true) {
                BytesRef ref = termsEnum.next()
                if (ref == null) break
int shift = ref.bytes[ref.offset] - NumericUtils.SHIFT_START_INT
                if (shift > 31 || shift < 0) {} else {
ref.copyChars(NumericUtils.prefixCodedToInt(ref).toString())
                    }
            }


Still, it would be nice if the copyChars method was fixed or at least if the warning about copyChars being broken for offset != 0 was not only in the code but also in the javaDoc.

Best regards,
Olivier





Mike McCandless

http://blog.mikemccandless.com


On Wed, Jul 30, 2014 at 2:21 AM, Olivier Binda <olivier.bi...@wanadoo.fr> wrote:
Hello.

How do you get the terms for a TermsEnum of an IntField coded with
precisionStc = Integer.Max that you  get with
MultiFields.getTerms(reader, intField).iterator(null) ?

I had mitigated success trying to get the terms out of this iterator with
NumericUtils.prefixCodedToInt

I tried

while (true) {
                 BytesRef ref = termsEnum?.next()
                 if (ref == null) break
                 int value = NumericUtils.prefixCodedToInt(ref)
}

But it doesn't work (reliably) because of the trie structure I guess

In an IntField with values 1,2,3,4,5 it worked
But in an Int Field with all values from 1 to 2500, I got exceptions :
lots of shifts aren't in the 0..31 range and it looks like there are
"Blocks" with :

first a term with shift 0 and value n
followed by lots of terms with shift that aren't in 0..31 but who share the
same prefix...

Best regards,
Olivier


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Reply via email to