It would have to be by something at "Layer 7" that understands HTTP. What web server/technology are you using? With apache you can do it with mod_rewrite.
Blocking IP addresses is really a clumsy way to do it anyways since GAE urlfetch changes IP ranges periodically. If you really don't like the scraper, I suggest an alternative to simply blocking them. That'll probably just put a bunch of errors in their logs and alert them to the problem. More fun is to silently replace the content with something nefarious. The best option would probably content that Googlebot will detect as being spammy/low quality so it kills their search ranking. Jeff On Thu, Jul 26, 2012 at 5:47 PM, jswap <[email protected]> wrote: > Thanks, Jeff, but how do I block requests by header and not by IP? I > usually use iptables to block the requests, but cannot do so in this > situation because then I block access to Google's PageSpeed Insights tool > too. > > > > On Thursday, July 26, 2012 5:27:27 PM UTC-4, Jeff Schnitzer wrote: >> >> Every fetch request from GAE includes the appid as a header... you >> obviously see it yourself, which is how you know the appid of the >> crawler. This is how Google enables you to block applications; just >> block all requests with that particular header. >> >> Jeff >> >> > -- > You received this message because you are subscribed to the Google Groups > "Google App Engine" group. > To view this discussion on the web visit > https://groups.google.com/d/msg/google-appengine/-/wpzX8AzTGogJ. > > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/google-appengine?hl=en. -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
