Re: corrupt index: .fdx and stored norms

Doron Cohen Tue, 10 Oct 2006 14:17:03 -0700

I meant ~182K files ...

> Nick, could you provide additional info:
> (1) Env info - Lucene version, Java version, OS, JVM args (e.g. -XmNNN),
> etc...
> (2) is this reproducible? By the file sizes there seem to be ~182 indexed
> docs when the problem occur, so, if this is reproducible it would
hopefully
> not take too long. If reproducible, I wonder if you can also create it
> without storing any field... (should go faster).
>
> - Doron
>
> "NIck P" <[EMAIL PROTECTED]> wrote on 10/10/2006 12:24:19:
>
> > Hi, i sent this 30 min ago and it didn't seem to go through so i'm
> > trying again, i apologize if two copies finally arrive.
> >
> > I am working on the development of a product that is using Lucene. A
> > corrupt index was reported by testers and it is in an odd state.
> > The indexes are built in batches (to multiple ram indexes in parallel)
> > and then eventually merged into a disk index with
> > IndexWriter.addIndexes(Directory[]).
> > Somehow the index got corrupted, there were no indications of a crash
or
> > errors in log. The failure in SegmentMerger.mergeNorms:
> >   private void mergeNorms() throws IOException {
> >     for (int i = 0; i < fieldInfos.size(); i++) {
> >       FieldInfo fi = fieldInfos.fieldInfo(i);
> >       if (fi.isIndexed && !fi.omitNorms) {
> >         IndexOutput output = directory.createOutput(segment + ".f" +
i);
> >         try {
> >           for (int j = 0; j < readers.size(); j++) {
> >             IndexReader reader = (IndexReader) readers.elementAt(j);
> >             int maxDoc = reader.maxDoc();
> >             byte[] input = new byte[maxDoc];
> >             reader.norms(fi.name, input, 0);  <==== ERROR HERE
> >             for (int k = 0; k < maxDoc; k++) {
> >               if (!reader.isDeleted(k)) {
> >                 output.writeByte(input[k]);
> >               }
> >             }
> >           }
> >         } finally {
> >           output.close();
> >         }
> >       }
> >     }
> >   }
> >
> > The problem is that the maxDoc() returned by the indexReader
> > (FieldsReader in this case) is larger then the size, in bytes, of the
> > norms file. then there is an error in IndexInput.read(byte[], int,
> > int) because there is not enough data in file to read.
> > Here is part of the directory listing (there are many stored fields of
> > the same size so omitting all but first 3):
> >
> > -rw-r--r--  1 icmadmin db2grp1        811 Sep 27 20:48 _a4.fnm
> > -rw-r--r--  1 icmadmin db2grp1    1451696 Sep 27 20:49 _a4.fdx
> > -rw-r--r--  1 icmadmin db2grp1   12736304 Sep 27 20:49 _a4.fdt
> > -rw-r--r--  1 icmadmin db2grp1 5648544509 Sep 27 21:30 _a4.prx
> > -rw-r--r--  1 icmadmin db2grp1 1695149231 Sep 27 21:30 _a4.frq
> > -rw-r--r--  1 icmadmin db2grp1   45688880 Sep 27 21:30 _a4.tis
> > -rw-r--r--  1 icmadmin db2grp1     673588 Sep 27 21:30 _a4.tii
> > -rw-r--r--  1 icmadmin db2grp1     181159 Sep 27 21:30 _a4.f2
> > -rw-r--r--  1 icmadmin db2grp1     181159 Sep 27 21:30 _a4.f1
> > -rw-r--r--  1 icmadmin db2grp1     181159 Sep 27 21:30 _a4.f0
> >
> > from looking at the code the sizeof(.fdx)/8 should equal sizeof(.f0)
> > but it doesn't in this case.
> >
> > any ideas? Also, I'm wasn't sure if this was more appropriate for dev
> > or user so i guessed user.
> >
> > -Nick
> > (programmer working @ ibm)
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: corrupt index: .fdx and stored norms

Reply via email to