Alternately, you could institute a rate limiting mechanism. If a user asks for more than X pages over a specified time period, serve up a HTTP 429 error code (stands for Too Many Requests). Legitimate bots such as GoogleBot will slow down their requests, while poorly-written bots will most likely fail.
You could also put a trap into your robots.txt. List a url in your robots.txt that goes to a servlet, if a user hits that servlet, their IP is banned for some amount of time. On Jul 26, 7:47 pm, jswap <[email protected]> wrote: > Thanks, Jeff, but how do I block requests by header and not by IP? I > usually use iptables to block the requests, but cannot do so in this > situation because then I block access to Google's PageSpeed Insights tool > too. > > > > On Thursday, July 26, 2012 5:27:27 PM UTC-4, Jeff Schnitzer wrote: > > > Every fetch request from GAE includes the appid as a header... you > > obviously see it yourself, which is how you know the appid of the > > crawler. This is how Google enables you to block applications; just > > block all requests with that particular header. > > > Jeff -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
