Nick, could you provide additional info: (1) Env info - Lucene version, Java version, OS, JVM args (e.g. -XmNNN), etc... (2) is this reproducible? By the file sizes there seem to be ~182 indexed docs when the problem occur, so, if this is reproducible it would hopefully not take too long. If reproducible, I wonder if you can also create it without storing any field... (should go faster).
- Doron "NIck P" <[EMAIL PROTECTED]> wrote on 10/10/2006 12:24:19: > Hi, i sent this 30 min ago and it didn't seem to go through so i'm > trying again, i apologize if two copies finally arrive. > > I am working on the development of a product that is using Lucene. A > corrupt index was reported by testers and it is in an odd state. > The indexes are built in batches (to multiple ram indexes in parallel) > and then eventually merged into a disk index with > IndexWriter.addIndexes(Directory[]). > Somehow the index got corrupted, there were no indications of a crash or > errors in log. The failure in SegmentMerger.mergeNorms: > private void mergeNorms() throws IOException { > for (int i = 0; i < fieldInfos.size(); i++) { > FieldInfo fi = fieldInfos.fieldInfo(i); > if (fi.isIndexed && !fi.omitNorms) { > IndexOutput output = directory.createOutput(segment + ".f" + i); > try { > for (int j = 0; j < readers.size(); j++) { > IndexReader reader = (IndexReader) readers.elementAt(j); > int maxDoc = reader.maxDoc(); > byte[] input = new byte[maxDoc]; > reader.norms(fi.name, input, 0); <==== ERROR HERE > for (int k = 0; k < maxDoc; k++) { > if (!reader.isDeleted(k)) { > output.writeByte(input[k]); > } > } > } > } finally { > output.close(); > } > } > } > } > > The problem is that the maxDoc() returned by the indexReader > (FieldsReader in this case) is larger then the size, in bytes, of the > norms file. then there is an error in IndexInput.read(byte[], int, > int) because there is not enough data in file to read. > Here is part of the directory listing (there are many stored fields of > the same size so omitting all but first 3): > > -rw-r--r-- 1 icmadmin db2grp1 811 Sep 27 20:48 _a4.fnm > -rw-r--r-- 1 icmadmin db2grp1 1451696 Sep 27 20:49 _a4.fdx > -rw-r--r-- 1 icmadmin db2grp1 12736304 Sep 27 20:49 _a4.fdt > -rw-r--r-- 1 icmadmin db2grp1 5648544509 Sep 27 21:30 _a4.prx > -rw-r--r-- 1 icmadmin db2grp1 1695149231 Sep 27 21:30 _a4.frq > -rw-r--r-- 1 icmadmin db2grp1 45688880 Sep 27 21:30 _a4.tis > -rw-r--r-- 1 icmadmin db2grp1 673588 Sep 27 21:30 _a4.tii > -rw-r--r-- 1 icmadmin db2grp1 181159 Sep 27 21:30 _a4.f2 > -rw-r--r-- 1 icmadmin db2grp1 181159 Sep 27 21:30 _a4.f1 > -rw-r--r-- 1 icmadmin db2grp1 181159 Sep 27 21:30 _a4.f0 > > from looking at the code the sizeof(.fdx)/8 should equal sizeof(.f0) > but it doesn't in this case. > > any ideas? Also, I'm wasn't sure if this was more appropriate for dev > or user so i guessed user. > > -Nick > (programmer working @ ibm) > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]