Re: CheckIndex complaining about -1 for norms value

Adrien Grand Thu, 11 Jun 2020 00:00:49 -0700

To my knowledge, -1 always represented the maximum supported length, both
before and after 7.0 (when we changed the norms encoding). One thing that
changed when we introduced sparse norms is that documents with no value
moved from having 0 as a norm to not having a norm at all, but I don't see
how this could explain what you are seeing either.


Do you know what is the Lucene version that initially indexed this document
(and thus computed the norm value)?

On Thu, Jun 11, 2020 at 8:45 AM Trejkaz <[email protected]> wrote:

> Well,
>
> We're using the default Lucene similarity. But as far as I know, we've
> always disabled norms as well. So I'm surprised I'm even seeing norms
> mentioned in the context of our own index, which is why I wondered
> whether -1 might have been an older placeholder for "no value" which
> later became 0 or something.
>
> About the only thing I'm sure about at the moment is that whatever is
> going on is weird.
>
> TX
>
> On Thu, 11 Jun 2020 at 15:38, Adrien Grand <[email protected]> wrote:
> >
> > Hi Trejkaz,
> >
> > Negative norm values are legal. The problem here is that Lucene expects
> > that documents that have no terms must either not have a norm value
> > (typically because the document doesn't have a value for the field), or a
> > norm value equal to 0 (typically because the token stream over the field
> > value produced no tokens).
> >
> > Are you using a custom similarity or one of the Lucene ones? One would
> only
> > get -1 as a norm with the Lucene similarities if it had a number of
> tokens
> > that is very close to Integer.MAX_VALUE.
> >
> > On Thu, Jun 11, 2020 at 4:22 AM Trejkaz <[email protected]> wrote:
> >
> > > Hi all.
> > >
> > > We use CheckIndex as a post-migration sanity check and are seeing this
> > > quirk, and I'm wondering whether negative norms is even legit or
> > > whether it should have been treated as if it were zero...
> > >
> > > TX
> > >
> > >
> > > 0.00% total deletions; 378 documents; 0 deleteions
> > > Segments file=segments_1 numSegments=1 version=8.5.1
> > > id=52isly98kogao7j0cnautwknj
> > >   1 of 1: name=_0 maxDoc=378
> > >     version=8.5.1
> > >     id=52isly98kogao7j0cnautwkni
> > >     codec=Lucene84
> > >     compound=false
> > >     numFiles=18
> > >     size (MB)=0.663
> > >     diagnostics = {java.vendor=Oracle Corporation, os=Mac OS X,
> > > java.version=1.8.0_191, java.vm.version=25.191-b12,
> > > lucene.version=8.5.1, os.arch=x86_64,
> > > java.runtime.version=1.8.0_191-b12, source=addIndexes(CodecReader...),
> > > os.version=10.15.5, timestamp=1591841756208}
> > >     no deletions
> > >     test: open reader.........OK [took 0.004 sec]
> > >     test: check integrity.....OK [took 0.002 sec]
> > >     test: check live docs.....OK [took 0.000 sec]
> > >     test: field infos.........OK [36 fields] [took 0.000 sec]
> > >     test: field norms.........OK [26 fields] [took 0.001 sec]
> > >     test: terms, freq, prox...ERROR: java.lang.RuntimeException:
> > > Document 0 doesn't have terms according to postings but has a norm
> > > value that is not zero: -1
> > >
> > > java.lang.RuntimeException: Document 0 doesn't have terms according to
> > > postings but has a norm value that is not zero: -1
> > > at org.apache.lucene.index.CheckIndex.checkFields(CheckIndex.java:1678)
> > > at
> org.apache.lucene.index.CheckIndex.testPostings(CheckIndex.java:1871)
> > > at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:724)
> > > at org.apache.lucene.index.CheckIndex.doCheck(CheckIndex.java:2973)
> > >
> > >     test: stored fields.......OK [15935 total field count; avg 42.2
> > > fields per doc] [took 0.003 sec]
> > >     test: term vectors........OK [1173 total term vector count; avg
> > > 3.1 term/freq vector fields per doc] [took 0.170 sec]
> > >     test: docvalues...........OK [16 docvalues fields; 11 BINARY; 2
> > > NUMERIC; 0 SORTED; 2 SORTED_NUMERIC; 1 SORTED_SET] [took 0.003 sec]
> > >     test: points..............OK [4 fields, 1509 points] [took 0.000
> sec]
> > > FAILED
> > >     WARNING: exorciseIndex() would remove reference to this segment;
> > > full exception:
> > > java.lang.RuntimeException: Term Index test failed
> > > at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:750)
> > > at org.apache.lucene.index.CheckIndex.doCheck(CheckIndex.java:2973)
> > >
> > > WARNING: 1 broken segments (containing 378 documents) detected
> > > Took 0.355 sec total.
> > > WARNING: would write new segments file, and 378 documents would be
> > > lost, if -exorcise were specified
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [email protected]
> > > For additional commands, e-mail: [email protected]
> > >
> > >
> >
> > --
> > Adrien
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

-- 
Adrien

Re: CheckIndex complaining about -1 for norms value

Reply via email to