Markus, First of all, a note about the KB/message statistics: this is only valid as long as you get messages from the cache! Currently the cache size is set to 10,000, so you will see a drop in memory usage for message numbers, which exceed this size. Processing messages would also necessarily become slower.
The simplest strategies for the stemmer would be: 1. Move the stemmer to the companion object 2. Create a new stemmer every time it's needed By doing a naive test with 100,000 invocations of stem for the same stemmer and creating 100,000 stemmer objects it seems that instantiation takes almost double time. So I'm not sure contentioun would be much of an issue, besides the only time a stemmer is needed is for search and the word frequency cloud. These are not specific to a particular message, so can be (and should be) moved to the the companion object, too. Furthermore, search is done in a compass transaction anyway. We could also have some type of pooling, but I'm not sure how efficient it would be. This definitely needs some benchmarks before we try to optimize too much. What do you think?
