I have never worked on a project before which can show a 5000x
improvement in a matter of days :)
Dick:
I'm really curious to see how that impacts performance - maybe we
should try another Stax test at some point...
<< as in next week? Or after Boston?
/Anne
On 29. nov. 2009, at 15.35, David Pollak wrote:
On Sun, Nov 29, 2009 at 5:03 AM, Vassil Dichev <[email protected]>
wrote:
I haven't looked in detail into last nights profiling results,
but it
seems
we are down to 4.6 Mbyte! That's an 5000x improvement!
I plan to document the details on Monday night. If there's time
left I
will
also start with drafting a longer blog post. It would be great if
Vassil
would provide a short description/explanation of his changes.
OK, this is going to be embarassing for me, but this is not actually
an improvement, but a return to the performance capabilities of ESME
from several months ago.
Vassil,
I am sorry you feel embarrassed. While bugs happen, your effective
code to
bug ratio is quite excellent... and this is the kind of bug we like
to have:
easily identifiable (thanks Markus for your excellent tests) and
easily
fixable.
So, in the future, we definitely need more tests (both as part of the
development process and integration/performance tests). We also
need to
work together to address the results of the tests. Results of a
single test
should not be viewed as a repudiation of a design. Tests should be
invitations to either fix a bug (as in this case) or in the event
that a bug
cannot be fixed without significant refactoring, a reasoned
discussion of
the merits and likely performance implications of another design.
I am happy that you found the root cause of the problem and that you
fixed
it quickly. That's what counts.
Thanks,
David
I'm not surprised that it was 5000x worse, because every time the
public/friends' timeline was displayed for any user, every message
was
fetched from the database, converted to XML, transformed into XHTML
and JSON... Not only that, but every time a new message has been
received, this would force the timelines of all users, who receive
the
message, to be rerendered again, which means again reloading from DB
and the same XML acrobatics for all 20 messages of the 2 timelines,
which causes 40 messages to be processed for each user.
To top it off, when the Textile parser was activated, its overhead
was
multiplied 40 times per user, which for 300 users means 12000
messages
rerendered, just because one user decided to send a message! Yes,
this
sounds horrible. David was indeed correct that the Textile parser
itself was not the main culprit, but just magnifying the effects of a
more serious bug.
The problem was the Message.findMessages method. It is supposed to
cache messages based on a LRU strategy. When I introduced access
pools, messages had to be controlled not only when loading them from
the DB, but from the cache. So I discarded the messages from the
temporary structure which had to be returned to the user. The
messages
which were discarded would go to the finder method, where the query
constructed would make sure only messages from valid pools would be
returned (inefficiency one). Furthermore, I allowed a bug where
messages from the public pool would also always be discarded from the
cache and fetched from the DB (inefficiency two). So stuff would
work,
but the cache wasn't used in practice.
In conclusion, this is just one more argument for keeping messages in
memory, instead of fetching them from the DB.
Another important conclusion is that performance tests are just as
important as unit and integration tests and can uncover functional
problems too, especially with caches.
--
Lift, the simply functional web framework http://liftweb.net
Beginning Scala http://www.apress.com/book/view/1430219890
Follow me: http://twitter.com/dpp
Surf the harmonics