--- Scott Smith <[EMAIL PROTECTED]> wrote: > I have an application that is reading in XML files and indexing them. > Each > XML file is 3K-6K bytes. This application preloads a database that I > will > add to "on the fly" later. However, all I want it to do initially is > take > some existing files and create the initial index as quick as I can. > > Since I want to index "on the fly" later, I set the merge factor to > 10. I'm > assuming that I can't create the index initially with one merge > factor > (e.g., 100) and then change the merge factor later (true?).
I believe this is wrong. You can change the merge factor at any time. I haven't tested this, though. > What I see is that it takes 1-3 seconds per xml file to do the index. > This > means I'm indexing around 150k bytes per minute. I also notice that > the CPU > utilization rarely exceeds 5% (looking at task manager on a Windows > box). I > use Xerces to read in the files (SAX interface) and I don't close or > optimize the index between stories nor do I sleep anyplace. I've > looked at > the page fault numbers and they aren't changing much. I guess I > would have > expected that I would have pretty much pegged the CPU and seen much > faster > indexing. > > Any ideas/suggestions? Check how much time XML parsing is taking, and how much the actual indexing. Lucene indexing is IO bound, not CPU bound, so what you are seeing (5% CPU usage) sounds like Lucene may be the bottleneck. But check your XML parsing code. Post the code, if you want. In 1.3 version there are 2 other indexing parameters that you can use for tuning. You can try playing with those. You can also give JVM more memory. One of my articles on the Resources page of Lucene's site mentions this type of stuff. Otis __________________________________ Do you Yahoo!? Yahoo! Hotjobs: Enter the "Signing Bonus" Sweepstakes http://hotjobs.sweepstakes.yahoo.com/signingbonus --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
