----- Original Message ----- 
From: "Bruce Bradbury"
Sent: Thursday, March 14, 2002 12:02 AM


: Our request report shows 6000 requests for a
: particular (pdf) file. Does that really mean
: that 6000 people have tried to download it?
: Or does this include a large number of robot
: hits?

Robots are not excluded from Analog's Request Report,
nor do I think they can be (at all). Simple answer to
your question is: "Yes, the reported figure includes
robot 'hits'."

The more difficult question -- the proportion of
human/robot accesses -- can't really be discerned
from an Analog report in an accurate fashion.

[Please, someone correct me if the above is mistaken.]

As to the number of requests turning up in your
reports, you might want to read the response to
question C.11 of the Analog FAQ, "When someone
reads one of my PDF files, it scores dozens of
hits", here:

   - http://analog.cx/docs/faq.html#faq143


: Is there a rule-of-thumb for how many requests
: are genuine human requests?

Not really. There are a lot of dependencies (robot
sophistication, penetration, identification, etc.)
to consider when looking at lump sum figures such
as those reported by Analog.

- don




--- NOTHING BEYOND THIS POINT IS GENERALLY USEFUL ---


I can think of two approaches that might permit at
least a ballpark guess for this particular case:

(1) Abuse Analog (just this once) -

       a. Construct a minimal Analog report
          configuration consisting only of
          the Operating System Report and the
          Request Report.

       b. Set FILEINCLUDE/EXCLUDE such that only
          PDF files count as 'files'.

       c. Define PAGEINCLUDE/EXCLUDE such that
          only PDF files count as 'pages'.

       d. Configure the OS report to display
          %pages, at least.

       e. Configure the Request Report to display
          #reqs, at least.

       f. Lower report floors as necessary.

       g. Run report.

    After run, you can look at the number of requests
    for a given PDF and multiply it by the percentage
    of *all* page requests by robots listed in the OS
    Report ... there's a guess at the number of 'hits'
    for a given PDF coming from 'bots and not humans.

    It should 'work' but I have not tested it myself.

    NOTE: This guess is nothing more than that -- a guess.

       * The default ROBOTINCLUDE/EXCLUDE rules for Analog
         are not overly thorough -- a lot of 'real' robots
         will be missed in the reporting. Some non-robots
         may be misidentified as robots.

       * Comparing percentage across all pages (as defined
         above) to a specific file (as defined above) will
         also introduce error in the guesstimation.

       * This is an abuse of Analog, and goes against the
         spirit of http://analog.cx/docs/meaning.html ;)

       * Tweaking ROBOTINCLUDE/EXCLUDE rules may result
         in more precise definition of a robot, improving
         the guess. See (2) below.

    NOTE: This guess is nothing more than that -- a guess.

(2) Go back to the logs -

    Grep the original logs to isolate the lines including
    only requests for the specific file(s) in which you
    are interested. Examine User Agent strings in the
    resultant set and separate the good from the bad, as
    you understand them to be (definite robot vs. likely
    non-robot). ;)

    This is really only meant for those with too much time
    on their hands. It, too, won't be completely accurate
    unless one can divine what is a robot and what is not
    infallibly.

    However, inspecting the User Agent strings present in
    your own logs (globally or for a file subset) can lead
    to better INCLUDE/EXCLUDE rules for robots in all of
    your reporting -- provided you have the time and trust
    yourself to do the right thing. Sort out all the unique
    User Agent strings present in your logs (using tools other
    than Analog) and see where the present ROBOTINCLUDE/EXCLUDE
    rules you have implemented fail. Fix what is broken.

    Repeat: Few, if any, have time to investigate their logs
    so deeply, and the result still won't be 100% accurate.

------- THE ABOVE MAY NOT HAVE BEEN USEFUL AT ALL ;) -------




+------------------------------------------------------------------------
|  This is the analog-help mailing list. To unsubscribe from this
|  mailing list, go to
|    http://lists.isite.net/listgate/analog-help/unsubscribe.html
|
|  List archives are available at
|    http://www.mail-archive.com/[email protected]/
|    http://lists.isite.net/listgate/analog-help/archives/
|    http://www.tallylist.com/archives/index.cfm/mlist.7
+------------------------------------------------------------------------

Reply via email to