Thanks for the quick reply :-)

Sadly I have no UNIX host to hand and these are Gig files and so I can't
head/tail/grep easily. Windows grep dies... I'll write something to parse
the files so I can have a real look at the records...

I've some other 'funnies' like 1.98% Netscape/4 browser usage (according to
the summary) but if I run something like the top 2000 browser sigs in the
full browser report I can't find a single reference to Netscape/4 (or course
if I could simply grep the files... :-( ).

I'll try the PAGEINCLUDE you suggest, but something would seem to be going
wrong from the way parts of the query string are showing up in the File Type
report. Yes I get [no extension] pages but from a page tracking service I'm
expecting around 460K pages and I'm 'only' seeing 395K. But it's the long
tail of .s=m and similar files which suggest some counting is going astray.

My thought was that I could 'mop these up' by definining each 'mis-read'
filetype as a page but my various attempts have failed. I'm runnig 6.0/Win32
if that's an issue?

Thanks again.../Iain

 

-----Original Message-----
From: analog-help-boun...@lists.meer.net
[mailto:analog-help-boun...@lists.meer.net] On Behalf Of Stephen Turner
Sent: 20 February 2009 19:47
To: analog-help
Subject: Re: [analog-help] Problem with page counts

2009/2/20 Iain Hunneybell <i...@ipmarketing.co.uk>:
> I am trying to analyse pages from a large 'portal' site and am having 
> real problems with page counts and all attempts with PAGEINCLUDE, TYPE 
> and FILEALIAS and other experiements fail.
>
> The site generates URLs similar to:
> /bdotg/action/home?r.l1=1078549133&r.lc=en&r.s=m
>
> It seems to be the period in the input vars that's causing the problem 
> as the File Type report then lists things like:
>
> reqs    %reqs   Gbytes  %bytes  extension
> 7277    0.08%   0.18    0.32%   .s=tl"
> 12683   0.15%   0.11    0.20%
> .t=CAMPAIGN&furlname=selfassessment&furlparam=selfassessment"
> 4485    0.05%   0.11    0.20%   .s=m"
>
> Note the very low percentages as this is in effect counting page by 
> page as a different file type.
>

I'm not seeing this. I just tried this experiment and I see this file listed
as [no extension] which is correct. What do they look like in your raw
logfiles? For example, is the question mark encoded as %3F, which would be a
literal question mark instead of an argument separator?

> So I've tried things like:
>
> PAGEINCLUDE *.s*
> PAGEINCLUDE *.t*
>
> (with and without the trailing *).
>
> I've also tried patterns like:
>
> PAGEINCLUDE /home
>
> But all attempts fail.
>

PAGEINCLUDE /bdotg/action/home

works for me. But if my hypothesis above is correct, you might need

PAGEINCLUDE /bdotg/action/home*

The PAGEINCLUDE has nothing to do with the file types by the way (although
it's typically used that way). You can make any single file into a "page".

--
Stephen Turner



--
Stephen Turner
+-----------------------------------------------------------------------
+-
|  TO UNSUBSCRIBE from this list:
|    http://lists.meer.net/mailman/listinfo/analog-help
|
|  Analog Documentation: http://analog.cx/docs/Readme.html  List 
| archives:  http://www.analog.cx/docs/mailing.html#listarchives
|  Usenet version: news://news.gmane.org/gmane.comp.web.analog.general
+-----------------------------------------------------------------------
+-


+------------------------------------------------------------------------
|  TO UNSUBSCRIBE from this list:
|    http://lists.meer.net/mailman/listinfo/analog-help
|
|  Analog Documentation: http://analog.cx/docs/Readme.html
|  List archives:  http://www.analog.cx/docs/mailing.html#listarchives
|  Usenet version: news://news.gmane.org/gmane.comp.web.analog.general
+------------------------------------------------------------------------

Reply via email to