> I haven't looked in detail into last nights profiling results, but it seems > we are down to 4.6 Mbyte! That's an 5000x improvement! > I plan to document the details on Monday night. If there's time left I will > also start with drafting a longer blog post. It would be great if Vassil > would provide a short description/explanation of his changes.
OK, this is going to be embarassing for me, but this is not actually an improvement, but a return to the performance capabilities of ESME from several months ago. I'm not surprised that it was 5000x worse, because every time the public/friends' timeline was displayed for any user, every message was fetched from the database, converted to XML, transformed into XHTML and JSON... Not only that, but every time a new message has been received, this would force the timelines of all users, who receive the message, to be rerendered again, which means again reloading from DB and the same XML acrobatics for all 20 messages of the 2 timelines, which causes 40 messages to be processed for each user. To top it off, when the Textile parser was activated, its overhead was multiplied 40 times per user, which for 300 users means 12000 messages rerendered, just because one user decided to send a message! Yes, this sounds horrible. David was indeed correct that the Textile parser itself was not the main culprit, but just magnifying the effects of a more serious bug. The problem was the Message.findMessages method. It is supposed to cache messages based on a LRU strategy. When I introduced access pools, messages had to be controlled not only when loading them from the DB, but from the cache. So I discarded the messages from the temporary structure which had to be returned to the user. The messages which were discarded would go to the finder method, where the query constructed would make sure only messages from valid pools would be returned (inefficiency one). Furthermore, I allowed a bug where messages from the public pool would also always be discarded from the cache and fetched from the DB (inefficiency two). So stuff would work, but the cache wasn't used in practice. In conclusion, this is just one more argument for keeping messages in memory, instead of fetching them from the DB. Another important conclusion is that performance tests are just as important as unit and integration tests and can uncover functional problems too, especially with caches.
