On 20/03/2012 14:54, Jeroen Massar wrote: > For everybody who is "monitoring" other people's websites, please please > please, monitor something static like /robots.txt as that can be > statically served and is kinda appropriate as it is intended for robots.
Depends on what you are monitoring. If you're looking for layer 4 ipv6 connectivity then robots.txt is fine. If you're trying to determine whether a site is serving active content on ipv6 and not serving http errors, then it's pretty pointless to monitor robots.txt - you need to monitor /. > Oh and of course do set the User-Agent to something logical and to be > super nice include a contact address so that people who do check their > logs once in a while for fishy things they at least know what is > happening there and that it is not a process run afoul or something. Good policy, yes. Some robots do this but others don't. > Of course, asking before doing tends to be a good idea too. Depends on the scale. I'm not going to ask permission to poll someone else's site every 5 minutes, and I would be surprised if they asked me the same. OTOH, if they were polling to the point that it was causing issues, that might be different. > The IPv6 Internet already consists way too much out of monitoring by > pulling pages and doing pings... "way too much" for what? IPv6 is not widely adopted. > Fortunately that should heavily change in a few months. We've been saying this for years. World IPv6 day 2012 will come and go, and things are unlikely to change a whole lot. The only thing that World IPv6 day 2012 will ensure is that people whose ipv6 configuration actively interferes with their daily Internet usage will be self-flagged and their configuration issues can be dealt with. > (who noticed a certain s....h company performing latency checks against > one of his sites, which was no problem, but the fact that they where > causing almost more hits/traffic/load than normal clients was a bit on > the much side If that web page is configured to be as top-heavy as this, then I'd suggest putting a cache in front of it. nginx is good for this sort of thing. Nick