Scott, Here are some figures to use for comparision. Using the latest Lucene release, I index about 200 similar-sized XML files at a time, on a Windows XP machine (2Ghz). First I create a new index, which adds the documents at a rate of about 8 per second (I don't recall what the cpu % is during this). Then I merge this new index with the master one (using, I think, the default merge factor), which takes about 4.5 minutes (during which time the cpu utilization stays near 100%). The master index currently holds about 115,000 such documents.
HTH, Regards, Terry ----- Original Message ----- From: "Scott Smith" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Monday, January 05, 2004 10:26 PM Subject: Performance question > I have an application that is reading in XML files and indexing them. Each > XML file is 3K-6K bytes. This application preloads a database that I will > add to "on the fly" later. However, all I want it to do initially is take > some existing files and create the initial index as quick as I can. > > Since I want to index "on the fly" later, I set the merge factor to 10. I'm > assuming that I can't create the index initially with one merge factor > (e.g., 100) and then change the merge factor later (true?). > > What I see is that it takes 1-3 seconds per xml file to do the index. This > means I'm indexing around 150k bytes per minute. I also notice that the CPU > utilization rarely exceeds 5% (looking at task manager on a Windows box). I > use Xerces to read in the files (SAX interface) and I don't close or > optimize the index between stories nor do I sleep anyplace. I've looked at > the page fault numbers and they aren't changing much. I guess I would have > expected that I would have pretty much pegged the CPU and seen much faster > indexing. > > Any ideas/suggestions? > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
