Hi Trejkaz,

Negative norm values are legal. The problem here is that Lucene expects
that documents that have no terms must either not have a norm value
(typically because the document doesn't have a value for the field), or a
norm value equal to 0 (typically because the token stream over the field
value produced no tokens).

Are you using a custom similarity or one of the Lucene ones? One would only
get -1 as a norm with the Lucene similarities if it had a number of tokens
that is very close to Integer.MAX_VALUE.

On Thu, Jun 11, 2020 at 4:22 AM Trejkaz <trej...@trypticon.org> wrote:

> Hi all.
>
> We use CheckIndex as a post-migration sanity check and are seeing this
> quirk, and I'm wondering whether negative norms is even legit or
> whether it should have been treated as if it were zero...
>
> TX
>
>
> 0.00% total deletions; 378 documents; 0 deleteions
> Segments file=segments_1 numSegments=1 version=8.5.1
> id=52isly98kogao7j0cnautwknj
>   1 of 1: name=_0 maxDoc=378
>     version=8.5.1
>     id=52isly98kogao7j0cnautwkni
>     codec=Lucene84
>     compound=false
>     numFiles=18
>     size (MB)=0.663
>     diagnostics = {java.vendor=Oracle Corporation, os=Mac OS X,
> java.version=1.8.0_191, java.vm.version=25.191-b12,
> lucene.version=8.5.1, os.arch=x86_64,
> java.runtime.version=1.8.0_191-b12, source=addIndexes(CodecReader...),
> os.version=10.15.5, timestamp=1591841756208}
>     no deletions
>     test: open reader.........OK [took 0.004 sec]
>     test: check integrity.....OK [took 0.002 sec]
>     test: check live docs.....OK [took 0.000 sec]
>     test: field infos.........OK [36 fields] [took 0.000 sec]
>     test: field norms.........OK [26 fields] [took 0.001 sec]
>     test: terms, freq, prox...ERROR: java.lang.RuntimeException:
> Document 0 doesn't have terms according to postings but has a norm
> value that is not zero: -1
>
> java.lang.RuntimeException: Document 0 doesn't have terms according to
> postings but has a norm value that is not zero: -1
> at org.apache.lucene.index.CheckIndex.checkFields(CheckIndex.java:1678)
> at org.apache.lucene.index.CheckIndex.testPostings(CheckIndex.java:1871)
> at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:724)
> at org.apache.lucene.index.CheckIndex.doCheck(CheckIndex.java:2973)
>
>     test: stored fields.......OK [15935 total field count; avg 42.2
> fields per doc] [took 0.003 sec]
>     test: term vectors........OK [1173 total term vector count; avg
> 3.1 term/freq vector fields per doc] [took 0.170 sec]
>     test: docvalues...........OK [16 docvalues fields; 11 BINARY; 2
> NUMERIC; 0 SORTED; 2 SORTED_NUMERIC; 1 SORTED_SET] [took 0.003 sec]
>     test: points..............OK [4 fields, 1509 points] [took 0.000 sec]
> FAILED
>     WARNING: exorciseIndex() would remove reference to this segment;
> full exception:
> java.lang.RuntimeException: Term Index test failed
> at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:750)
> at org.apache.lucene.index.CheckIndex.doCheck(CheckIndex.java:2973)
>
> WARNING: 1 broken segments (containing 378 documents) detected
> Took 0.355 sec total.
> WARNING: would write new segments file, and 378 documents would be
> lost, if -exorcise were specified
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

-- 
Adrien

Reply via email to