Re: [Flexwiki-users] Performance analysis

Craig Andera Fri, 07 Sep 2007 06:30:54 -0700

To continue the story. 

So I spent this week trying to do a more surgical analysis of performance in
the problematic cases we've uncovered. First, I had to switch to something
less broken than CLRProfiler for profiling. I like NProf, but it suffered
from a major problem: there's no way to say "start profiling now". That
means there was no way to ignore the bit where the cache is being populated.
Although interesting in its own right, that data obscures the part I'm
actually trying to optimize for.


To make a long story medium, I wound up doing two things: 

1) Writing a driver console program that loads the FlexWiki engine and
renders a single topic. It does it twice, once to warm up the cache, and
once to measure perf against the populated cache. 

2) Modifying NProf to discard the data from the warmup run. 

So I now have a way to profile single topics, which is nice, because it
gives me data in about a minute rather than three hours, not to mention the
fact that I can drill down and see exactly methods are contributing most to
execution time. 

Based on this new and excellent data, I was able to add another pretty major
performance optimization. It turns out I wasn't caching the existence of a
namespace, which meant that every time you asked the Federation for a
NamespaceManager, it was going all the way down to the FileSystemStore.
Obviously, any file I/O (let alone dozens of times per request) is a real
performance killer. 

The end result of all this is that I can get DarrenSQLIS to render quite a
bit faster. Here are some stats to give you an idea: 

2.0.0.138-uncached: 61s
2.0.0.138-cached:   38s
Latest-uncached:    52s
Latest-cached:      28s

Where "latest" is the engine with the changes I've made but not yet checked
in. So you can see it's a pretty big improvement over the current 2.0 code.
There's a fair amount of noise in that, too, so from run to run it varies a
bit, but it's definitely better. 

Of course, it's still pretty darn slow, especially compared to the 11
seconds that page renders the *first* time in under 1.8, to say nothing of
the 338 milliseconds that it manages in the best case. 

The issue now is that the performance is becoming much harder to optimize
without architectural changes. It used to be the case that I could look at
the perf numbers and see that one function was responsible for 75% of the
time. As I've hacked it faster and faster, the bottlenecks are disappearing
one by one, and the slowness is sort of...diffusing. Now there's lots of
places that are contributing a little bit. 

I still have a few ideas about how to get some pretty big wins. One place
that's still contributing a pretty good portion of the time to process a
request is the AuthorizationProvider. Given that it has to examine
properties on the namespace, wiki, and topic level, that's not too
surprising. The big question is whether there's any reasonable way to cache
some of the calls to HasPermission, maybe just within a single request. 

Another thing I want to look at is the implementation of Sort in WikiTalk.
Our performance killer right now is Sort against large datasets. I believe
this to be because quicksort (the algorithm apparently in use) is O(log(n))
in time, bound by the comparison operation. The comparison appears to be a
dynamic evaluation of the WikiTalk objects in the collection being sorted.
That's a lot of evaluations, and they're each individually pretty slow (uses
Reflection, lots of lookups, etc.). 

My hope is that I can do something clever like evaluate the collection
objects once ahead of time, and then sort that, rather than dynamically
evaluate every time there's a comparison. Yes, that's a bit like the caching
that David talked about, but the difference is that we can keep it localized
to the WikiTalk code, and specific to one request. 

We'll see: I haven't looked at the code yet to see how easy that's going to
be, or even if it's possible. Also, I should probably try to optimize the
AuthorizationProvider since that'll benefit every request anyway. The big
question on that one is what assumptions I can make about
Thread.CurrentPrincipal object, as it will effectively be the cache key. 

So, to summarize: work continues. :) 



-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
Flexwiki-users mailing list
Flexwiki-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/flexwiki-users

Re: [Flexwiki-users] Performance analysis

Reply via email to