"Cunningham, Colin A" wrote:
> I'm looking for additional information w/ respect to parsing the user-agent
> field of the web log (IIS for now) in
> (a) for identifying Operating System (and flavors)
> (b) for identifying Browser type (and version)
> (c) in identifying likely robots or agents
>
> For (a) and (b), I was wondering if the logic used by Analog is available.
> (Should I just look in the Analog source code?)
Yes.
> Furthermore, does this
> follow some sort of standard format that different OS and Browsers follow in
> filling the user-agent field. If so, would someone kindly give me a
> pointer.
Well, IE uses 'Compaptible; IE X.XX' for browser version number. Most OSes are
listed at the end of the () set after the initial Agent_name/version part of the
string. There is no de-jure standard just a de-facto one used by most agents
(but not by most robots, and not by many WAP devices, unfortunately).
> I'm actually more interested in being able to identify for potential
> exclusion--at least from some reports--robots and agents that access the
> sites I'm analysing. The interpretation of the traffic statistics is often
> heavily influenced by bot/agent requests. E.g., on one site I'm monitoring,
> roughly 25% of page requests come from KeyNote agents, identifiable by
> user-agent="Mozilla/4.0+(compatible;+Keynote-Perspective+4.0)".
>
> I've looked into the information on robot exclusion proposals (see
> http://info.webcrawler.com/mak/projects/robots/robots.html and
> www.kollar.com/robots) but these have weaknesses in reality as they are
> non-obligatory. I also don't know much about the currency or degree of
> completeness were I to use the Web Robots Database on the webcrawler site.
There was discussion here back in June/July of completing a lits of robots that
would be generated from that database. I'm not certain the project ever went
anywhere. I think Marco was also working on a hand-built robots file. But I
haven't heard anything on this since June.
> Has anyone made much progress in detecting robots in conjunction w/ Analog
> reports?
The problem with 'detecting' robots is that it's inaccurate. You'll only catch
the well behaved and obvious ones. You could start with the list at the Web
Robots Database, I'm sure it contains most of the ones you'll see. If you want
to go further, you could make assumptions like "include all hosts that ever
requested robots.txt" but this isn't necessarily always valid.
> Also, are other Analog users interested in adding some sort of
> robot/agent reporting capability, or at least a Robot/Agent category in
> Analog's browser reports?
I think there has been significant interest in this in the past, but every time
someone looks into trying to do the project "completely" they suddenly realize
it's an immense job. If you're up to it, especially parsing the database into a
config file, I think a lot of Analog users would find it useful.
Jeremy Wadsack
Wadsack-Allen Digital Group
------------------------------------------------------------------------
This is the analog-help mailing list. To unsubscribe from this
mailing list, send mail to [EMAIL PROTECTED]
with "unsubscribe" in the main BODY OF THE MESSAGE.
List archived at http://www.mail-archive.com/[email protected]/
------------------------------------------------------------------------