Will wrote: > Greetings everyone. This is my first post so I apologize in advance > if im doing something wrong. > > I have inherited stats duties for our company which has about 20 > domains and lots and lots of IIS logs. We are using Urchin and cannot > switch at this time. I want to use Analog to pre-parse my IIS log > files on a daily basis by removing all log entries made by spiders (as > identified by some external machine-generated spiders.cfg file). > > Urchin has very crappy and limited functionality for filtering > spiders. It is clearly not doing a good job identifying crawlers so I > figured this was my best bet, to pre-parse using analog before Urchin > gets its grubby hands on my log files. > > Can anyone help me with a .cfg file and command line syntax for > accomplising this? I dont want it to do any reporting or analyzing, > just output the identical IIS log but with all spider/bot entries > removed.
Analog won't modify your logfiles - it will only read them in and report on the contents. If you want to physically exclude robots/spiders from your logs, you can use something as simple as the FINDSTR command included in Windows, alobg with a list of strings that identify spiders. You can create that list from information on http://www.robotstxt.org/ or you could create a custom list by using Analog to analyse your logs for behaviour that you identify as spider-like. (For example, you could run a Full Browser report to get a list of browser names that are obviously spiders). You would use FINDSTR like this to create a "no spider" version of your logfile: FINDSTR /V /I /F:spiders.txt ex050523.log > ns0505024.log spiders.txt would contain a list of strings that match known spiders in your logfile. That might be agent strings or host addresses. For example, it might contain the following lines: googlebot msnbot slurp 10.123.45.67 (where 10.123.45.67 is the IP address of a spider, for example). Note that this approach can have unexpected consequences. If you have a lot of referrals from a page called slurpy.htm, for example, it would also be excluded by the reference to the Inktomi spider in the list above. Aengus +------------------------------------------------------------------------ | TO UNSUBSCRIBE from this list: | http://lists.meer.net/mailman/listinfo/analog-help | | Usenet version: news://news.gmane.org/gmane.comp.web.analog.general | List archives: http://www.analog.cx/docs/mailing.html#listarchives +------------------------------------------------------------------------

