+1

On Thu, Jun 11, 2020 at 3:27 PM Michael McCandless <
luc...@mikemccandless.com> wrote:

> Maybe we should fix CheckIndex to print norms as unsigned integers?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Thu, Jun 11, 2020 at 3:00 AM Adrien Grand <jpou...@gmail.com> wrote:
>
> > To my knowledge, -1 always represented the maximum supported length, both
> > before and after 7.0 (when we changed the norms encoding). One thing that
> > changed when we introduced sparse norms is that documents with no value
> > moved from having 0 as a norm to not having a norm at all, but I don't
> see
> > how this could explain what you are seeing either.
> >
> > Do you know what is the Lucene version that initially indexed this
> document
> > (and thus computed the norm value)?
> >
> > On Thu, Jun 11, 2020 at 8:45 AM Trejkaz <trej...@trypticon.org> wrote:
> >
> > > Well,
> > >
> > > We're using the default Lucene similarity. But as far as I know, we've
> > > always disabled norms as well. So I'm surprised I'm even seeing norms
> > > mentioned in the context of our own index, which is why I wondered
> > > whether -1 might have been an older placeholder for "no value" which
> > > later became 0 or something.
> > >
> > > About the only thing I'm sure about at the moment is that whatever is
> > > going on is weird.
> > >
> > > TX
> > >
> > > On Thu, 11 Jun 2020 at 15:38, Adrien Grand <jpou...@gmail.com> wrote:
> > > >
> > > > Hi Trejkaz,
> > > >
> > > > Negative norm values are legal. The problem here is that Lucene
> expects
> > > > that documents that have no terms must either not have a norm value
> > > > (typically because the document doesn't have a value for the field),
> > or a
> > > > norm value equal to 0 (typically because the token stream over the
> > field
> > > > value produced no tokens).
> > > >
> > > > Are you using a custom similarity or one of the Lucene ones? One
> would
> > > only
> > > > get -1 as a norm with the Lucene similarities if it had a number of
> > > tokens
> > > > that is very close to Integer.MAX_VALUE.
> > > >
> > > > On Thu, Jun 11, 2020 at 4:22 AM Trejkaz <trej...@trypticon.org>
> wrote:
> > > >
> > > > > Hi all.
> > > > >
> > > > > We use CheckIndex as a post-migration sanity check and are seeing
> > this
> > > > > quirk, and I'm wondering whether negative norms is even legit or
> > > > > whether it should have been treated as if it were zero...
> > > > >
> > > > > TX
> > > > >
> > > > >
> > > > > 0.00% total deletions; 378 documents; 0 deleteions
> > > > > Segments file=segments_1 numSegments=1 version=8.5.1
> > > > > id=52isly98kogao7j0cnautwknj
> > > > >   1 of 1: name=_0 maxDoc=378
> > > > >     version=8.5.1
> > > > >     id=52isly98kogao7j0cnautwkni
> > > > >     codec=Lucene84
> > > > >     compound=false
> > > > >     numFiles=18
> > > > >     size (MB)=0.663
> > > > >     diagnostics = {java.vendor=Oracle Corporation, os=Mac OS X,
> > > > > java.version=1.8.0_191, java.vm.version=25.191-b12,
> > > > > lucene.version=8.5.1, os.arch=x86_64,
> > > > > java.runtime.version=1.8.0_191-b12,
> > source=addIndexes(CodecReader...),
> > > > > os.version=10.15.5, timestamp=1591841756208}
> > > > >     no deletions
> > > > >     test: open reader.........OK [took 0.004 sec]
> > > > >     test: check integrity.....OK [took 0.002 sec]
> > > > >     test: check live docs.....OK [took 0.000 sec]
> > > > >     test: field infos.........OK [36 fields] [took 0.000 sec]
> > > > >     test: field norms.........OK [26 fields] [took 0.001 sec]
> > > > >     test: terms, freq, prox...ERROR: java.lang.RuntimeException:
> > > > > Document 0 doesn't have terms according to postings but has a norm
> > > > > value that is not zero: -1
> > > > >
> > > > > java.lang.RuntimeException: Document 0 doesn't have terms according
> > to
> > > > > postings but has a norm value that is not zero: -1
> > > > > at
> > org.apache.lucene.index.CheckIndex.checkFields(CheckIndex.java:1678)
> > > > > at
> > > org.apache.lucene.index.CheckIndex.testPostings(CheckIndex.java:1871)
> > > > > at
> org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:724)
> > > > > at org.apache.lucene.index.CheckIndex.doCheck(CheckIndex.java:2973)
> > > > >
> > > > >     test: stored fields.......OK [15935 total field count; avg 42.2
> > > > > fields per doc] [took 0.003 sec]
> > > > >     test: term vectors........OK [1173 total term vector count; avg
> > > > > 3.1 term/freq vector fields per doc] [took 0.170 sec]
> > > > >     test: docvalues...........OK [16 docvalues fields; 11 BINARY; 2
> > > > > NUMERIC; 0 SORTED; 2 SORTED_NUMERIC; 1 SORTED_SET] [took 0.003 sec]
> > > > >     test: points..............OK [4 fields, 1509 points] [took
> 0.000
> > > sec]
> > > > > FAILED
> > > > >     WARNING: exorciseIndex() would remove reference to this
> segment;
> > > > > full exception:
> > > > > java.lang.RuntimeException: Term Index test failed
> > > > > at
> org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:750)
> > > > > at org.apache.lucene.index.CheckIndex.doCheck(CheckIndex.java:2973)
> > > > >
> > > > > WARNING: 1 broken segments (containing 378 documents) detected
> > > > > Took 0.355 sec total.
> > > > > WARNING: would write new segments file, and 378 documents would be
> > > > > lost, if -exorcise were specified
> > > > >
> > > > >
> ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > > > > For additional commands, e-mail: java-user-h...@lucene.apache.org
> > > > >
> > > > >
> > > >
> > > > --
> > > > Adrien
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > > For additional commands, e-mail: java-user-h...@lucene.apache.org
> > >
> > >
> >
> > --
> > Adrien
> >
>


-- 
Adrien

Reply via email to