I like how your mind thinks, Jeff :)
I did some googling and found the specifics on how to block using apache's
mod_rewrite. For the benefit of others, I post it here:
Inside your virtual host:
RewriteEngine on
# start
RewriteCond %{HTTP_USER_AGENT} ^AppEngine-Google;.*appid:.*steprep
RewriteRule .* - [F,L,E=nolog:1]
# end
# env=!nolog tells apache not to log when the nolog env var is set. you
probably already have this line so just append the " env=!nolog"
CustomLog logs/access_log combined env=!nolog
On Thursday, July 26, 2012 9:41:06 PM UTC-4, Jeff Schnitzer wrote:
>
> It would have to be by something at "Layer 7" that understands HTTP.
> What web server/technology are you using? With apache you can do it
> with mod_rewrite.
>
> Blocking IP addresses is really a clumsy way to do it anyways since
> GAE urlfetch changes IP ranges periodically.
>
> If you really don't like the scraper, I suggest an alternative to
> simply blocking them. That'll probably just put a bunch of errors in
> their logs and alert them to the problem. More fun is to silently
> replace the content with something nefarious. The best option would
> probably content that Googlebot will detect as being spammy/low
> quality so it kills their search ranking.
>
> Jeff
>
> On Thu, Jul 26, 2012 at 5:47 PM, jswap wrote:
> > Thanks, Jeff, but how do I block requests by header and not by IP? I
> > usually use iptables to block the requests, but cannot do so in this
> > situation because then I block access to Google's PageSpeed Insights
> tool
> > too.
> >
> >
> >
> > On Thursday, July 26, 2012 5:27:27 PM UTC-4, Jeff Schnitzer wrote:
> >>
> >> Every fetch request from GAE includes the appid as a header... you
> >> obviously see it yourself, which is how you know the appid of the
> >> crawler. This is how Google enables you to block applications; just
> >> block all requests with that particular header.
> >>
> >> Jeff
> >>
>
>
--
You received this message because you are subscribed to the Google Groups
"Google App Engine" group.
To view this discussion on the web visit
https://groups.google.com/d/msg/google-appengine/-/Sd1DI8JE1PsJ.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/google-appengine?hl=en.