Re: corrupt index: .fdx and stored norms

Doron Cohen Tue, 10 Oct 2006 14:05:58 -0700

Nick, could you provide additional info:
(1) Env info - Lucene version, Java version, OS, JVM args (e.g. -XmNNN),
etc...
(2) is this reproducible? By the file sizes there seem to be ~182 indexed
docs when the problem occur, so, if this is reproducible it would hopefully
not take too long. If reproducible, I wonder if you can also create it
without storing any field... (should go faster).


- Doron

"NIck P" <[EMAIL PROTECTED]> wrote on 10/10/2006 12:24:19:

> Hi, i sent this 30 min ago and it didn't seem to go through so i'm
> trying again, i apologize if two copies finally arrive.
>
> I am working on the development of a product that is using Lucene. A
> corrupt index was reported by testers and it is in an odd state.
> The indexes are built in batches (to multiple ram indexes in parallel)
> and then eventually merged into a disk index with
> IndexWriter.addIndexes(Directory[]).
> Somehow the index got corrupted, there were no indications of a crash or
> errors in log. The failure in SegmentMerger.mergeNorms:
>   private void mergeNorms() throws IOException {
>     for (int i = 0; i < fieldInfos.size(); i++) {
>       FieldInfo fi = fieldInfos.fieldInfo(i);
>       if (fi.isIndexed && !fi.omitNorms) {
>         IndexOutput output = directory.createOutput(segment + ".f" + i);
>         try {
>           for (int j = 0; j < readers.size(); j++) {
>             IndexReader reader = (IndexReader) readers.elementAt(j);
>             int maxDoc = reader.maxDoc();
>             byte[] input = new byte[maxDoc];
>             reader.norms(fi.name, input, 0);  <==== ERROR HERE
>             for (int k = 0; k < maxDoc; k++) {
>               if (!reader.isDeleted(k)) {
>                 output.writeByte(input[k]);
>               }
>             }
>           }
>         } finally {
>           output.close();
>         }
>       }
>     }
>   }
>
> The problem is that the maxDoc() returned by the indexReader
> (FieldsReader in this case) is larger then the size, in bytes, of the
> norms file. then there is an error in IndexInput.read(byte[], int,
> int) because there is not enough data in file to read.
> Here is part of the directory listing (there are many stored fields of
> the same size so omitting all but first 3):
>
> -rw-r--r--  1 icmadmin db2grp1        811 Sep 27 20:48 _a4.fnm
> -rw-r--r--  1 icmadmin db2grp1    1451696 Sep 27 20:49 _a4.fdx
> -rw-r--r--  1 icmadmin db2grp1   12736304 Sep 27 20:49 _a4.fdt
> -rw-r--r--  1 icmadmin db2grp1 5648544509 Sep 27 21:30 _a4.prx
> -rw-r--r--  1 icmadmin db2grp1 1695149231 Sep 27 21:30 _a4.frq
> -rw-r--r--  1 icmadmin db2grp1   45688880 Sep 27 21:30 _a4.tis
> -rw-r--r--  1 icmadmin db2grp1     673588 Sep 27 21:30 _a4.tii
> -rw-r--r--  1 icmadmin db2grp1     181159 Sep 27 21:30 _a4.f2
> -rw-r--r--  1 icmadmin db2grp1     181159 Sep 27 21:30 _a4.f1
> -rw-r--r--  1 icmadmin db2grp1     181159 Sep 27 21:30 _a4.f0
>
> from looking at the code the sizeof(.fdx)/8 should equal sizeof(.f0)
> but it doesn't in this case.
>
> any ideas? Also, I'm wasn't sure if this was more appropriate for dev
> or user so i guessed user.
>
> -Nick
> (programmer working @ ibm)
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: corrupt index: .fdx and stored norms

Reply via email to