Re: Performance update: Message size in memory

Markus Kohler Wed, 09 Dec 2009 00:30:04 -0800

Hi Vassil,

See comments below...

On Wed, Dec 9, 2009 at 8:46 AM, Vassil Dichev <[email protected]> wrote:

> Markus,
>
> First of all, a note about the KB/message statistics: this is only
> valid as long as you get messages from the cache! Currently the cache
> size is set to 10,000, so you will see a drop in memory usage for

message numbers, which exceed this size.

Yes I expected this. But I think you agree that for high performance with
lot's of users we need to take care that we can cache as much as possible.

> Processing messages would
> also necessarily become slower.
>
> The simplest strategies for the stemmer would be:
> 1. Move the stemmer to the companion object
> 2. Create a new stemmer every time it's needed
>
> By doing a naive test with 100,000 invocations of stem for the same
> stemmer and creating 100,000 stemmer objects it seems that
> instantiation takes almost double time. So I'm not sure contentioun
> would be much of an issue, besides the only time a stemmer is needed
> is for search and the word frequency cloud. These are not specific to
> a particular message, so can be (and should be) moved to the the
> companion object, too.

Yes that makes a lot of sense. Is the stemming currently done within the
thread that updates the UI?
Stemming could be batched then (update the word frequency only all n
messages).
I would rather like to avoid creating a new stemmer each time.

> Furthermore, search is done in a compass
> transaction anyway.
>
> I've also seen that Lucene has some potential issues with Finalizers, e.g.
they use large finalizable objects (IndexWriter IIRC). Is the index updated
for each message? I think it would also make sense to batch those updated if
possible.

> We could also have some type of pooling, but I'm not sure how
> efficient it would be. This definitely needs some benchmarks before we
> try to optimize too much.
>
> What do you think?
>

Yes. It's impossible to make decisions about which tradeoffs to make, as
long as we don't have an ESME instance with enough activce users running
(with detailed enough performance monitoring enabled).

I would therefore go for now withe the easiest possible implementation,
KISS!

Regards,
Markus

Re: Performance update: Message size in memory

Reply via email to