Thanks for the quick reply :-) Sadly I have no UNIX host to hand and these are Gig files and so I can't head/tail/grep easily. Windows grep dies... I'll write something to parse the files so I can have a real look at the records...
I've some other 'funnies' like 1.98% Netscape/4 browser usage (according to the summary) but if I run something like the top 2000 browser sigs in the full browser report I can't find a single reference to Netscape/4 (or course if I could simply grep the files... :-( ). I'll try the PAGEINCLUDE you suggest, but something would seem to be going wrong from the way parts of the query string are showing up in the File Type report. Yes I get [no extension] pages but from a page tracking service I'm expecting around 460K pages and I'm 'only' seeing 395K. But it's the long tail of .s=m and similar files which suggest some counting is going astray. My thought was that I could 'mop these up' by definining each 'mis-read' filetype as a page but my various attempts have failed. I'm runnig 6.0/Win32 if that's an issue? Thanks again.../Iain -----Original Message----- From: analog-help-boun...@lists.meer.net [mailto:analog-help-boun...@lists.meer.net] On Behalf Of Stephen Turner Sent: 20 February 2009 19:47 To: analog-help Subject: Re: [analog-help] Problem with page counts 2009/2/20 Iain Hunneybell <i...@ipmarketing.co.uk>: > I am trying to analyse pages from a large 'portal' site and am having > real problems with page counts and all attempts with PAGEINCLUDE, TYPE > and FILEALIAS and other experiements fail. > > The site generates URLs similar to: > /bdotg/action/home?r.l1=1078549133&r.lc=en&r.s=m > > It seems to be the period in the input vars that's causing the problem > as the File Type report then lists things like: > > reqs %reqs Gbytes %bytes extension > 7277 0.08% 0.18 0.32% .s=tl" > 12683 0.15% 0.11 0.20% > .t=CAMPAIGN&furlname=selfassessment&furlparam=selfassessment" > 4485 0.05% 0.11 0.20% .s=m" > > Note the very low percentages as this is in effect counting page by > page as a different file type. > I'm not seeing this. I just tried this experiment and I see this file listed as [no extension] which is correct. What do they look like in your raw logfiles? For example, is the question mark encoded as %3F, which would be a literal question mark instead of an argument separator? > So I've tried things like: > > PAGEINCLUDE *.s* > PAGEINCLUDE *.t* > > (with and without the trailing *). > > I've also tried patterns like: > > PAGEINCLUDE /home > > But all attempts fail. > PAGEINCLUDE /bdotg/action/home works for me. But if my hypothesis above is correct, you might need PAGEINCLUDE /bdotg/action/home* The PAGEINCLUDE has nothing to do with the file types by the way (although it's typically used that way). You can make any single file into a "page". -- Stephen Turner -- Stephen Turner +----------------------------------------------------------------------- +- | TO UNSUBSCRIBE from this list: | http://lists.meer.net/mailman/listinfo/analog-help | | Analog Documentation: http://analog.cx/docs/Readme.html List | archives: http://www.analog.cx/docs/mailing.html#listarchives | Usenet version: news://news.gmane.org/gmane.comp.web.analog.general +----------------------------------------------------------------------- +- +------------------------------------------------------------------------ | TO UNSUBSCRIBE from this list: | http://lists.meer.net/mailman/listinfo/analog-help | | Analog Documentation: http://analog.cx/docs/Readme.html | List archives: http://www.analog.cx/docs/mailing.html#listarchives | Usenet version: news://news.gmane.org/gmane.comp.web.analog.general +------------------------------------------------------------------------