On Sun, Nov 29, 2009 at 5:03 AM, Vassil Dichev <[email protected]> wrote:
> > I haven't looked in detail into last nights profiling results, but it > seems > > we are down to 4.6 Mbyte! That's an 5000x improvement! > > I plan to document the details on Monday night. If there's time left I > will > > also start with drafting a longer blog post. It would be great if Vassil > > would provide a short description/explanation of his changes. > > OK, this is going to be embarassing for me, but this is not actually > an improvement, but a return to the performance capabilities of ESME > from several months ago. > Vassil, I am sorry you feel embarrassed. While bugs happen, your effective code to bug ratio is quite excellent... and this is the kind of bug we like to have: easily identifiable (thanks Markus for your excellent tests) and easily fixable. So, in the future, we definitely need more tests (both as part of the development process and integration/performance tests). We also need to work together to address the results of the tests. Results of a single test should not be viewed as a repudiation of a design. Tests should be invitations to either fix a bug (as in this case) or in the event that a bug cannot be fixed without significant refactoring, a reasoned discussion of the merits and likely performance implications of another design. I am happy that you found the root cause of the problem and that you fixed it quickly. That's what counts. Thanks, David > > I'm not surprised that it was 5000x worse, because every time the > public/friends' timeline was displayed for any user, every message was > fetched from the database, converted to XML, transformed into XHTML > and JSON... Not only that, but every time a new message has been > received, this would force the timelines of all users, who receive the > message, to be rerendered again, which means again reloading from DB > and the same XML acrobatics for all 20 messages of the 2 timelines, > which causes 40 messages to be processed for each user. > > To top it off, when the Textile parser was activated, its overhead was > multiplied 40 times per user, which for 300 users means 12000 messages > rerendered, just because one user decided to send a message! Yes, this > sounds horrible. David was indeed correct that the Textile parser > itself was not the main culprit, but just magnifying the effects of a > more serious bug. > > The problem was the Message.findMessages method. It is supposed to > cache messages based on a LRU strategy. When I introduced access > pools, messages had to be controlled not only when loading them from > the DB, but from the cache. So I discarded the messages from the > temporary structure which had to be returned to the user. The messages > which were discarded would go to the finder method, where the query > constructed would make sure only messages from valid pools would be > returned (inefficiency one). Furthermore, I allowed a bug where > messages from the public pool would also always be discarded from the > cache and fetched from the DB (inefficiency two). So stuff would work, > but the cache wasn't used in practice. > > In conclusion, this is just one more argument for keeping messages in > memory, instead of fetching them from the DB. > > Another important conclusion is that performance tests are just as > important as unit and integration tests and can uncover functional > problems too, especially with caches. > -- Lift, the simply functional web framework http://liftweb.net Beginning Scala http://www.apress.com/book/view/1430219890 Follow me: http://twitter.com/dpp Surf the harmonics
