I just aborted a re-indexing operation (because it was taking too much time - will run
it overnight instead). But I was surprised by what I found in the index directory,
which contained a total of 1,402 index files! It started out with 36 files with the
name of "_I9a.*", followed by groups of 72 files with names like "_17si.*" and so
forth.
Is this normal?
Also, I noticed that during the indexing it would chug along, indexing at a pretty
decent rate, and then, every so often (I would estimate every several hundred added
files) it would stop for perhaps 10 - 30 seconds (occasionally longer), doing a bunch
of disk activity. Then it would resume again - almost like it was optimizing. (I'm
doing this on a notebook, so the disk IO is probably fairly slow.)
Is this normal?
Regards,
Terry
PS: The code I'm using to do the indexing is below:
import npg1.search.WebExecAnalyzer;
import org.apache.lucene.index.IndexWriter;
import npg1.search.WESimilarity2;
import npg1.search.WPDocument2a;
import java.io.File;
import java.util.Date;
class IndexWPFiles2a {
public static void main(String[] args) {
//args[0] = location of target directory to be indexed
//args[1] = location of index directory (in which to create index files)
System.out.println("starting");
try {
Date start = new Date();
String target = "c:/master_db/master_xml";
if(args[0] != null) {
target = args[0];
}
String index = "c:/master_db/master_index";
if(args[1] != null) {
index = args[1];
}
IndexWriter writer = null;
if(args.length < 3) {
writer = new IndexWriter(index, new WebExecAnalyzer(), true);
writer.mergeFactor = 50;
writer.setSimilarity(new WESimilarity2());
indexDocs(writer, new File(target));
} else {
writer = new IndexWriter(index, new WebExecAnalyzer(), false);
writer.setSimilarity(new WESimilarity2());
}
writer.optimize();
writer.close();
Date end = new Date();
System.out.print(end.getTime() - start.getTime());
System.out.println(" total milliseconds");
} catch (Exception e) {
System.out.println(" caught a " + e.getClass() +
"\n with message: " + e.getMessage());
}
}
public static void indexDocs(IndexWriter writer, File file)
throws Exception {
//System.out.println("starting indexing with internal path");
if (file.isDirectory()) {
String[] files = file.list();
for (int i = 0; i < files.length; i++){
//System.out.println("recursive call");
indexDocs(writer, new File(file, files[i]));
}
} else {
try {
System.out.println("adding " + file);
writer.addDocument(WPDocument2a.Document(file));
} catch (Exception e) {
System.out.println("error adding "+file+" - Exception: "+e.getMessage());
}
}
}
}