Lucian Wischik ([EMAIL PROTECTED]; Saturday, March 01, 2003 3:52 AM): >> > I don't get the "cross-correlation" part. I don't want to combine >> > two reports, or do I? >> It's the memory requirement. If you have 10,000 unique requests on >> your site (not including separate query strings) and you have 16 >> buckets in the processing time report, you now have to track 160,000 >> unique combinations of processing-time -> request. This is even worse >> for things like host to referrer!
> I don't think that's true. The problem isn't an n^2 problem. > The original request was for a list of the top 50 worse performers. So, you > have an heap with 50 elements in it, each element is a pair (time,name). For > every log entry you processes, check whether it's time is greater than the > quickest element on the heap, and if so, add it to the heap. Does "worst performer" mean the requests that had the longest processing time? Or the fewest number of requests? If it's just longest processing time, then you are right. I though we were counting requests (which when counting requires that every item be tracked, as I said before). > Alternatively, Jeremy, you were suggesting a set of buckets. If I understand > right, we'd see the worst few performers in the 1-2s range, the worst few > performers in the 2-5s range, and so on. This'd be just the same, except > with one heap per bucket. The bucket idea is just because the current Processing Time report is handled that way. > Caveat: I don't know what exactly "processing time" is. At least, my > logfiles don't seem to include it. If it's not explicitly stored in the log, > and instead has to be calculated as the time between two separate > requests... well, that'd involve some separate processing beforehand. It comes from the log files. Apache lists it in seconds (without fractional component) so it's pretty useless on that platform. IIS lists in miliseconds (or centiseconds on some versions?). It's the number of seconds between when the request was received and when the response was submitted. > There's an unrelated different cross-corelation program I wrote which > annotated the "Request Report" by adding, for each request, a list of the > top downloaders. You'd think this'd be an n^2 problem. But just run Analog > once the first time to get the request report, then run it a second time, > except on this second run it ignores everything but the requests it's been > told to look out for. The computational complexity of this second run is of > the same order as the first run. In practice, I didn't even bother writing > it properly, just stuck everything naively into STL containers, and it works > fine up to half a million log entries. The "host->referrer" you mention > would be like this. Well, I have to admit that my Big-O notation is really rusty and I usually just think of it in orders of infinity, but isn't that still a O(n^2) problem? But anyway, there are two ways to approach this problem. One is to try to do it all in a single pass, in which case the memory has to be available to hold a multivariate table of both dimensions (e.g. request vs. referrers). The other way is to run two passes on the log files. If there's lots of memory and a small number of unique items the first approach is much faster (disk access if 1000 times slower than memory access). If there are too many unique items or not enough memory then the second method is the only alternative. If you want to use the method you propose you might as well just do it with a Perl script (or STL program or whatever). There would be very little (if any) noticeable performance gain by building this logic into Analog. -- Jeremy Wadsack Wadsack-Allen Digital Group +------------------------------------------------------------------------ | TO UNSUBSCRIBE from this list: | http://lists.isite.net/listgate/analog-help/unsubscribe.html | | Digest version: http://lists.isite.net/listgate/analog-help-digest/ | Usenet version: news://news.gmane.org/gmane.comp.web.analog.general | List archives: http://www.analog.cx/docs/mailing.html#listarchives +------------------------------------------------------------------------