You might want to consider using LuSql, which is a high performance, multithreaded, well documented tool designed specifically for moving data from a JDBC database into Lucene (you didn't say if it was a JDBC-accessible db...) http://lab.cisti-icist.nrc-cnrc.gc.ca/cistilabswiki/index.php/LuSql
Disclosure: I am the author of LuSql. -Glen Newton http://zzzoot.blogspot.com/ http://lab.cisti-icist.nrc-cnrc.gc.ca/cistilabswiki/index.php/Glen_Newton 2009/10/22 Paul Taylor <paul_t...@fastmail.fm>: > I'm building a lucene index from a database, creating 1 about 1 million > documents, unsuprisingly this takes quite a long time. > I do this by sending a query to the db over a range of ids , (10,000) > records > Add these results in Lucene > Then get next 10,0000 and so on. > When completed indexing I then call optimize() > I also set indexWriter.setMaxBufferedDocs(1000) and > indexWriter.setMergeFactor(3000) but don't fully understand these values. > Each document contains about 10 small fields > > I'm looking for some ways to improve performance. > > This index writing is single threaded, is there a way I can multi-thread > writing to the indexing ? > I only call optimize() once at the end, is the best way to do it. > I'm going to run a profiler over the code, but are there any rules of thumbs > on the best values to set for MaxBufferedDocs and Mergefactor() > > thanks Paul > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- - --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org