----- Original Message ----- From: "Bruce Bradbury" Sent: Thursday, March 14, 2002 12:02 AM
: Our request report shows 6000 requests for a : particular (pdf) file. Does that really mean : that 6000 people have tried to download it? : Or does this include a large number of robot : hits? Robots are not excluded from Analog's Request Report, nor do I think they can be (at all). Simple answer to your question is: "Yes, the reported figure includes robot 'hits'." The more difficult question -- the proportion of human/robot accesses -- can't really be discerned from an Analog report in an accurate fashion. [Please, someone correct me if the above is mistaken.] As to the number of requests turning up in your reports, you might want to read the response to question C.11 of the Analog FAQ, "When someone reads one of my PDF files, it scores dozens of hits", here: - http://analog.cx/docs/faq.html#faq143 : Is there a rule-of-thumb for how many requests : are genuine human requests? Not really. There are a lot of dependencies (robot sophistication, penetration, identification, etc.) to consider when looking at lump sum figures such as those reported by Analog. - don --- NOTHING BEYOND THIS POINT IS GENERALLY USEFUL --- I can think of two approaches that might permit at least a ballpark guess for this particular case: (1) Abuse Analog (just this once) - a. Construct a minimal Analog report configuration consisting only of the Operating System Report and the Request Report. b. Set FILEINCLUDE/EXCLUDE such that only PDF files count as 'files'. c. Define PAGEINCLUDE/EXCLUDE such that only PDF files count as 'pages'. d. Configure the OS report to display %pages, at least. e. Configure the Request Report to display #reqs, at least. f. Lower report floors as necessary. g. Run report. After run, you can look at the number of requests for a given PDF and multiply it by the percentage of *all* page requests by robots listed in the OS Report ... there's a guess at the number of 'hits' for a given PDF coming from 'bots and not humans. It should 'work' but I have not tested it myself. NOTE: This guess is nothing more than that -- a guess. * The default ROBOTINCLUDE/EXCLUDE rules for Analog are not overly thorough -- a lot of 'real' robots will be missed in the reporting. Some non-robots may be misidentified as robots. * Comparing percentage across all pages (as defined above) to a specific file (as defined above) will also introduce error in the guesstimation. * This is an abuse of Analog, and goes against the spirit of http://analog.cx/docs/meaning.html ;) * Tweaking ROBOTINCLUDE/EXCLUDE rules may result in more precise definition of a robot, improving the guess. See (2) below. NOTE: This guess is nothing more than that -- a guess. (2) Go back to the logs - Grep the original logs to isolate the lines including only requests for the specific file(s) in which you are interested. Examine User Agent strings in the resultant set and separate the good from the bad, as you understand them to be (definite robot vs. likely non-robot). ;) This is really only meant for those with too much time on their hands. It, too, won't be completely accurate unless one can divine what is a robot and what is not infallibly. However, inspecting the User Agent strings present in your own logs (globally or for a file subset) can lead to better INCLUDE/EXCLUDE rules for robots in all of your reporting -- provided you have the time and trust yourself to do the right thing. Sort out all the unique User Agent strings present in your logs (using tools other than Analog) and see where the present ROBOTINCLUDE/EXCLUDE rules you have implemented fail. Fix what is broken. Repeat: Few, if any, have time to investigate their logs so deeply, and the result still won't be 100% accurate. ------- THE ABOVE MAY NOT HAVE BEEN USEFUL AT ALL ;) ------- +------------------------------------------------------------------------ | This is the analog-help mailing list. To unsubscribe from this | mailing list, go to | http://lists.isite.net/listgate/analog-help/unsubscribe.html | | List archives are available at | http://www.mail-archive.com/[email protected]/ | http://lists.isite.net/listgate/analog-help/archives/ | http://www.tallylist.com/archives/index.cfm/mlist.7 +------------------------------------------------------------------------
