Erick,

On Thu, May 9, 2013 at 10:28 PM, Erick Erickson <erickerick...@gmail.com> wrote:
> Yeah, this is getting warmer. Some of the
> CompressingStoredFieldsReader objects are 240M. The documents aren't
> nearly that large I don't think (but I'll verify).

Wow, 240M is a lot! Would you be able to know which member of these
instances is making them so large? Since you were mentioning
threading, I first thought that the culprit was the buffer used for
decompression but it could be the fields index as well (which is
stored in memory but should be shared across instances for the same
segment, so threading shouldn't make the situation worse).

> But still, over 700 of these objects live at once? I _think_ I'm
> seeing the number go up significantly when the number of indexing
> threads increases, but that's a bit of indirect evidence. My other
> question would be whether you'd expect the number of these objects to
> go up as the number of segments goes up, i.e. I assume they're
> per-segment....

Indeed, these objects are managed per-segment, so increasing the
number of segments increases the number of stored fields reader
instances.

> So the pattern here is atomic updates on documents where some of the
> fields get quite large. So the underlying reader has to decompress
> these a lot. Do you have any suggestions how to mitigate this? Other
> than "don't do that<G>"....

As Savia suggests, lowering the number of indexing threads should
help. If the merge factor is highish, lowering it could help as well.
On my end, I'll fix the stored fields reader to not reuse the buffer
for decompression (https://issues.apache.org/jira/browse/LUCENE-4995,
even if we find out that it is not the problem here, it shouldn't
hurt).

-- 
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to