hi to all, please help! I think I mixed my brain up already with this stuff...
I'm trying to index about 29 textfiles where the biggest one is ~700Mb and the smallest ~300Mb. I achieved once to run the whole index, with a merge factor = 10 and maxMergeDocs=10000. This took more than 35 hours I think (don't know exactly) and it didn't use much RAM (though it could have). unfortunately I had a call to optimize at the end and while optimization an IOException (File to big) occured (while merging). As I run the program on a multi-processor machine I now changed the code to index each file in a single thread and write to one single IndexWriter. the merge factor is still at 10. maxMergeDocs is at 1.000.000. I set the maximum heap size to 1MB. I tried to use RAMDirectory (as mentioned in the mailing list) and just use IndexWriter.addDocument(). At the moment it seems not to make any difference. after a while _all_ the threads exit one after another (not all at once!) with an OutOfMemoryError. the priority of all of them is at the minimum. even if the multithreading doesn't increase performance I would be glad if I could just once get it running again. I would be even happier if someone could give me a hint what would be the best way to index this amount of data. (the average size of an entry that gets parsed for a Document is about 1Kb.) thanx for any help! chantal -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
