Alternately, you could institute a rate limiting mechanism. If a user
asks for more than X pages over a specified time period, serve up a
HTTP 429 error code (stands for Too Many Requests). Legitimate bots
such as GoogleBot will slow down their requests, while poorly-written
bots will most likely fail.

You could also put a trap into your robots.txt. List a url in your
robots.txt that goes to a servlet, if a user hits that servlet, their
IP is banned for some amount of time.

On Jul 26, 7:47 pm, jswap <[email protected]> wrote:
> Thanks, Jeff, but how do I block requests by header and not by IP?  I
> usually use iptables to block the requests, but cannot do so in this
> situation because then I block access to Google's PageSpeed Insights tool
> too.
>
>
>
> On Thursday, July 26, 2012 5:27:27 PM UTC-4, Jeff Schnitzer wrote:
>
> > Every fetch request from GAE includes the appid as a header... you
> > obviously see it yourself, which is how you know the appid of the
> > crawler.  This is how Google enables you to block applications; just
> > block all requests with that particular header.
>
> > Jeff

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Reply via email to