I like how your mind thinks, Jeff :)

I did some googling and found the specifics on how to block using apache's 
mod_rewrite.  For the benefit of others, I post it here:

Inside your virtual host:

RewriteEngine on
# start
RewriteCond     %{HTTP_USER_AGENT}  ^AppEngine-Google;.*appid:.*steprep
RewriteRule .* - [F,L,E=nolog:1]
# end

#  env=!nolog tells apache not to log when the nolog env var is set. you 
probably already have this line so just append the " env=!nolog"
CustomLog logs/access_log combined  env=!nolog


On Thursday, July 26, 2012 9:41:06 PM UTC-4, Jeff Schnitzer wrote:
>
> It would have to be by something at "Layer 7" that understands HTTP. 
> What web server/technology are you using?  With apache you can do it 
> with mod_rewrite. 
>
> Blocking IP addresses is really a clumsy way to do it anyways since 
> GAE urlfetch changes IP ranges periodically. 
>
> If you really don't like the scraper, I suggest an alternative to 
> simply blocking them.  That'll probably just put a bunch of errors in 
> their logs and alert them to the problem.  More fun is to silently 
> replace the content with something nefarious.  The best option would 
> probably content that Googlebot will detect as being spammy/low 
> quality so it kills their search ranking. 
>
> Jeff 
>
> On Thu, Jul 26, 2012 at 5:47 PM, jswap wrote: 
> > Thanks, Jeff, but how do I block requests by header and not by IP?  I 
> > usually use iptables to block the requests, but cannot do so in this 
> > situation because then I block access to Google's PageSpeed Insights 
> tool 
> > too. 
> > 
> > 
> > 
> > On Thursday, July 26, 2012 5:27:27 PM UTC-4, Jeff Schnitzer wrote: 
> >> 
> >> Every fetch request from GAE includes the appid as a header... you 
> >> obviously see it yourself, which is how you know the appid of the 
> >> crawler.  This is how Google enables you to block applications; just 
> >> block all requests with that particular header. 
> >> 
> >> Jeff 
> >> 
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/google-appengine/-/Sd1DI8JE1PsJ.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Reply via email to