Alexey Serba pointed out an issue he was seeing to me last night. He said that 
when he used an older version of Solr to index millions of docs, the memory 
usage stayed quite low - but with a recent trunk version, the memory usage sky 
rocketed. No OOM that I have heard of or seen yet, but rather than cycling 
between 50 and a couple hundred megabytes of RAM, the usage jumps up to what is 
available. It doesn't drop back down until you do a commit. 

Interested, I started indexing millions of docs with my benchmark work. And I 
didn't see the problem. Based on some early profiling by Alexey, it looked like 
buffered deletes where involved (by default, Solr always uses update to 
maintain unique ids). I indexed about 13 million docs, and RAM usage looked 
nice. After a bit of digging though, I saw that the doc maker was not assigning 
id's sequentially for some reason - it was assigning the same id a bunch of 
times in a row before incrementing it. Odd - so I fixed this to increment on 
every document. And now I see the problem right away easily. Memory consumption 
just goes up, up, up and tops out near the max available.

Still investigating. I have not tried with pure Lucene yet, but it looks like a 
pure Lucene issue to me so far. I see that in late June Mike fixed something 
related to buffered deletes - perhaps there is still something off in how ram 
usage is tracked for deletes?

- Mark Miller
lucidimagination.com









---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to