Thanks Otis! What do you mean by building it in batches? Does it mean I should close the IndexWriter every 1000 rows and reopen it? Does that releases references to the document objects so that they can be garbage-collected?
I'm calling optimize() only at the end. I agree that 1500 documents is very small. I'm building the index on a PC with 512 megs, and the indexing process is quickly gobbling up around 400 megs when I index around 1800 documents and the whole machine is grinding to a virtual halt. I'm using the latest DotLucene .NET port, so may be there's a memory leak in it. I have experience with AltaVista search (acquired by FastSearch), and I used to call MakeStable() every 20,000 documents to flush memory structures to disk. There doesn't seem to be an equivalent in Lucene. -- Homam --- Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > Hello, > > There are a few things you can do: > > 1) Don't just pull all rows from the DB at once. Do > that in batches. > > 2) If you can get a Reader from your SqlDataReader, > consider this: > http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/document/Field.html#Text(java.lang.String,%20java.io.Reader) > > 3) Give the JVM more memory to play with by using > -Xms and -Xmx JVM > parameters > > 4) See IndexWriter's minMergeDocs parameter. > > 5) Are you calling optimize() at some point by any > chance? Leave that > call for the end. > > 1500 documents with 30 columns of short > String/number values is not a > lot. You may be doing something else not Lucene > related that's slowing > things down. > > Otis > > > --- "Homam S.A." <[EMAIL PROTECTED]> wrote: > > > I'm trying to index a large number of records from > the > > DB (a few millions). Each record will be stored as > a > > document with about 30 fields, most of them are > > UnStored and represent small strings or numbers. > No > > huge DB Text fields. > > > > But I'm running out of memory very fast, and the > > indexing is slowing down to a crawl once I hit > around > > 1500 records. The problem is each document is > holding > > references to the string objects returned from > > ToString() on the DB field, and the IndexWriter is > > holding references to all these document objects > in > > memory, so the garbage collector is getting a > chance > > to clean these up. > > > > How do you guys go about indexing a large DB > table? > > Here's a snippet of my code (this method is called > for > > each record in the DB): > > > > private void IndexRow(SqlDataReader rdr, > IndexWriter > > iw) { > > Document doc = new Document(); > > for (int i = 0; i < BrowseFieldNames.Length; i++) > { > > doc.Add(Field.UnStored(BrowseFieldNames[i], > > rdr.GetValue(i).ToString())); > > } > > iw.AddDocument(doc); > > } > > > > > > > > > > > > __________________________________ > > Do you Yahoo!? > > Yahoo! Mail - Find what you need with new enhanced > search. > > http://info.mail.yahoo.com/mail_250 > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: > [EMAIL PROTECTED] > > For additional commands, e-mail: > [EMAIL PROTECTED] > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: > [EMAIL PROTECTED] > For additional commands, e-mail: > [EMAIL PROTECTED] > > __________________________________ Do you Yahoo!? Take Yahoo! Mail with you! Get it on your mobile phone. http://mobile.yahoo.com/maildemo --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]