On Mon, Jun 13, 2011 at 8:13 PM, Erick Erickson <erickerick...@gmail.com> wrote: > I half remember that this has come up before, but I couldn't find the > thread. I was running some tests over the weekend that involved > indexing 1.9M documents from the English Wiki dump. > > I'm consistently seeing that trunk takes about twice as long to index > the docs as 1.4, 3.2 and 3x. Optimize is also taking quite a bit > longer I admit that these aren't very sophisticated tests, and I only > ran the trunk process twice (although both those were consistent). > > I'm pretty sure my rambuffersize and autocommit settings are > identical. I remove the data/index directory before each run. These > results are running the indexing program in IntelliJ, on my Mac, both > the server and the indexing programs were running locally. > > No, trunk isn't compiling before running <G>. > > Here's the server definition: > new StreamingUpdateSolrServer(url, 10, 4); > > and I'm batching up the documents and sending them to Solr in batches of > 1,000. > > So, my question is whether this should be pursued. Note that I'm still > getting around 3K docs/second, which I can't complain about. Not that > that stops me, you understand. And in return for a memory footprint > reduction from 389M to 90M after some off-the-wall sorting and > faceting I'll take it! > > Hmmmm, speaking of which, the memory usage changes seem like a good > candidate for a page on the Wiki, anyone want to suggest a home? > > > Solr 1.4.1 > Total Time Taken-> 257 seconds > Total documents added-> 1917728 > Docs/sec-> 7461 > starting optimize > optimizing took 26 seconds > > Solr 3.2 > Total Time Taken-> 243 seconds > Total documents added-> 1917728 > Docs/sec-> 7891 > starting optimize > optimizing took 21 seconds > > Solr 3x > Total Time Taken-> 269 seconds > Total documents added-> 1917728 > Docs/sec-> 7129 > starting optimize > optimizing took 21 seconds > > Solr trunk. 2011-6-11: 17:24 EST > Total Time Taken-> 592 seconds > Total documents added-> 1917728 > Docs/sec-> 3239 > starting optimize > optimizing took 159 seconds > > What do folks think? Is there anything I can/should do to narrow this down?
Hi Eric, this looks weird, I have some questions: - you are indexing into the same disk as you read the data from? - what are you rambuffer settings? - how many threads are you using to send data to solr? - what is your autocommit setting? simon > > Erick > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org