Alexey Serba pointed out an issue he was seeing to me last night. He said that when he used an older version of Solr to index millions of docs, the memory usage stayed quite low - but with a recent trunk version, the memory usage sky rocketed. No OOM that I have heard of or seen yet, but rather than cycling between 50 and a couple hundred megabytes of RAM, the usage jumps up to what is available. It doesn't drop back down until you do a commit.
Interested, I started indexing millions of docs with my benchmark work. And I didn't see the problem. Based on some early profiling by Alexey, it looked like buffered deletes where involved (by default, Solr always uses update to maintain unique ids). I indexed about 13 million docs, and RAM usage looked nice. After a bit of digging though, I saw that the doc maker was not assigning id's sequentially for some reason - it was assigning the same id a bunch of times in a row before incrementing it. Odd - so I fixed this to increment on every document. And now I see the problem right away easily. Memory consumption just goes up, up, up and tops out near the max available. Still investigating. I have not tried with pure Lucene yet, but it looks like a pure Lucene issue to me so far. I see that in late June Mike fixed something related to buffered deletes - perhaps there is still something off in how ram usage is tracked for deletes? - Mark Miller lucidimagination.com --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
