It would have to be by something at "Layer 7" that understands HTTP.
What web server/technology are you using?  With apache you can do it
with mod_rewrite.

Blocking IP addresses is really a clumsy way to do it anyways since
GAE urlfetch changes IP ranges periodically.

If you really don't like the scraper, I suggest an alternative to
simply blocking them.  That'll probably just put a bunch of errors in
their logs and alert them to the problem.  More fun is to silently
replace the content with something nefarious.  The best option would
probably content that Googlebot will detect as being spammy/low
quality so it kills their search ranking.

Jeff

On Thu, Jul 26, 2012 at 5:47 PM, jswap <[email protected]> wrote:
> Thanks, Jeff, but how do I block requests by header and not by IP?  I
> usually use iptables to block the requests, but cannot do so in this
> situation because then I block access to Google's PageSpeed Insights tool
> too.
>
>
>
> On Thursday, July 26, 2012 5:27:27 PM UTC-4, Jeff Schnitzer wrote:
>>
>> Every fetch request from GAE includes the appid as a header... you
>> obviously see it yourself, which is how you know the appid of the
>> crawler.  This is how Google enables you to block applications; just
>> block all requests with that particular header.
>>
>> Jeff
>>
>>
> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/google-appengine/-/wpzX8AzTGogJ.
>
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected].
> For more options, visit this group at
> http://groups.google.com/group/google-appengine?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Reply via email to