Re: [analog-help] List of worst performers

analog-help Mon, 03 Mar 2003 01:41:25 -0800

I (the original poster) haven't been able to comment during the weekend.

Jeremy Wadsack wrote:

Lucian Wischik ([EMAIL PROTECTED]; Saturday, March 01, 2003 3:52 AM):

I don't get the "cross-correlation" part. I don't want to combine two reports, or do I?

It's the memory requirement. If you have 10,000 unique requests on your site (not including separate query strings) and you have 16 buckets in the processing time report, you now have to track 160,000 unique combinations of processing-time -> request. This is even worse for things like host to referrer!

I don't think that's true. The problem isn't an n^2 problem.

Lucian Wischik got it right. It would require the memory necessary to store 50 url:s, a variable and the program code. It would require one pass through the logfiles. However, it would be possible to sum the execution time for each page and list the top 50 worst *average* processing times. But that is NOT what I'm talking about. That would indeed require some memory but only proportional to the number of unique pages.

The original request was for a list of the top 50 worse performers. So, you have an heap with 50 elements in it, each element is a pair (time,name). For every log entry you processes, check whether it's time is greater than the quickest element on the heap, and if so, add it to the heap.

Exactly my intention. But you meant the slowest element on the list :-)

Does "worst performer" mean the requests that had the longest
processing time? Or the fewest number of requests? If it's just
longest processing time, then you are right. I though we were counting
requests (which when counting requires that every item be tracked, as
I said before).

Processing time, IIS can store it as milliseconds.

Alternatively, Jeremy, you were suggesting a set of buckets. If I understand right, we'd see the worst few performers in the 1-2s range, the worst few performers in the 2-5s range, and so on. This'd be just the same, except with one heap per bucket.
The bucket idea is just because the current Processing Time report is
handled that way.

The bucket idea would actually be more useful to me. Overall performance would improve more if I could shave 0.5 seconds off a script that is called 50000 times compared to if I cut five seconds of the time for a script that is called ten times during a day. But it would probably be easier to understand a report with only one bucket.

Caveat: I don't know what exactly "processing time" is. At least, my logfiles don't seem to include it. If it's not explicitly stored in the log, and instead has to be calculated as the time between two separate requests... well, that'd involve some separate processing beforehand.
It comes from the log files. Apache lists it in seconds (without
fractional component) so it's pretty useless on that platform. IIS
lists in miliseconds (or centiseconds on some versions?). It's the
number of seconds between when the request was received and when the
response was submitted.
There's an unrelated different cross-corelation program I wrote which annotated the "Request Report" by adding, for each request, a list of the top downloaders. You'd think this'd be an n^2 problem. But just run Analog once the first time to get the request report, then run it a second time, except on this second run it ignores everything but the requests it's been told to look out for. The computational complexity of this second run is of the same order as the first run. In practice, I didn't even bother writing it properly, just stuck everything naively into STL containers, and it works fine up to half a million log entries. The "host->referrer" you mention would be like this.
Well, I have to admit that my Big-O notation is really rusty and I
usually just think of it in orders of infinity, but isn't that still a
O(n^2) problem?
But anyway, there are two ways to approach this problem. One is to try
to do it all in a single pass, in which case the memory has to be
available to hold a multivariate table of both dimensions (e.g.
request vs. referrers). The other way is to run two passes on the log
files. If there's lots of memory and a small number of unique items
the first approach is much faster (disk access if 1000 times slower
than memory access). If there are too many unique items or not enough
memory then the second method is the only alternative.

No, It's unrelated to the number of request. It's more like a "dynamic" filter. If the processing time is greater than a certain value then include the page in the list of worst performers. But pages might get kicked out of the list and the threshold to get on the list will grow.


If you want to use the method you propose you might as well just do it
with a Perl script (or STL program or whatever). There would be very
little (if any) noticeable performance gain by building this logic
into Analog.

Yes, It would be easy for me to do it in Perl but I want it included in the pretty report!


+------------------------------------------------------------------------
|  TO UNSUBSCRIBE from this list:
|    http://lists.isite.net/listgate/analog-help/unsubscribe.html
|
|  Digest version: http://lists.isite.net/listgate/analog-help-digest/
|  Usenet version: news://news.gmane.org/gmane.comp.web.analog.general
|  List archives:  http://www.analog.cx/docs/mailing.html#listarchives
+------------------------------------------------------------------------

Re: [analog-help] List of worst performers

Reply via email to