405 is being returned for these requests anyway. The incoming rate is <1 QPS - beside filling up your logs I'm not sure how, if at all, this is effecting your app.
On Friday, 3 August 2012 06:08:21 UTC+10, Kate wrote: > > How can I block the following curl requests. Not every IP is different and > I get 10s of 1000s of them every day. > > Honestly I do not know HOW to block them. What method/code? > > > 2012-08-02 15:03:21.103 / 405 55ms 0kb curl/7.18.2 > (i386-redhat-linux-gnu) libcurl/7.18.2 NSS/3.12.2.0 zlib/1.2.3 > libidn/0.6.14 libssh2/0.18 > > 132.72.23.10 - - [02/Aug/2012:13:03:21 -0700] "HEAD / HTTP/1.1" 405 124 - > "curl/7.18.2 (i386-redhat-linux-gnu) libcurl/7.18.2 NSS/3.12.2.0 zlib/1.2.3 > libidn/0.6.14 libssh2/0.18" "aussieclouds.appspot.com" ms=56 cpu_ms=0 > api_cpu_ms=0 cpm_usd=0.000045 > instance=00c61b117c41a67b1b944a189d7cc38d5365564c > <https://appengine.google.com/instances?app_id=aussieclouds&version_id=1.360754534133043769&key=00c61b117c41a67b1b944a189d7cc38d5365564c#00c61b117c41a67b1b944a189d7cc38d5365564c> > > > > On Thursday, July 26, 2012 5:27:27 PM UTC-4, Jeff Schnitzer wrote: >> >> Every fetch request from GAE includes the appid as a header... you >> obviously see it yourself, which is how you know the appid of the >> crawler. This is how Google enables you to block applications; just >> block all requests with that particular header. >> >> Jeff >> >> On Wed, Jul 25, 2012 at 9:35 AM, jswap <[email protected]> wrote: >> > I run a website containing lots of doctor-related data. We get crawled >> by >> > rogue crawlers from thousands of IP addresses DAILY (mostly in Russia) >> and >> > we sometimes see our content show up on other websites. I define a >> crawler >> > as "rogue" when it does not obey robots.txt exclusions, and the >> crawling >> > company offers no benefit to us and just sucks up system resources. >> > >> > Google App Engine is hosting a crawler (appid: s~steprep) that is >> similar to >> > the Russian ones we block. This crawler crawls us aggressively, sucks >> up >> > system resources, ignores the robots.txt file, and offers no benefit to >> us. >> > Per our usual policy, we have been blocking the hundreds of Google IP >> > addresses that this crawler is crawling from. The problem is that one >> or >> > more of these IP addresses also hosts Google's "PageSpeed Insights" >> page, >> > located here: https://developers.google.com/speed/pagespeed/insights >> > >> > My questions for Google are: >> > 1 - Is it your intention that websites be unable to block crawlers that >> you >> > host? >> > 2 - Is it your intention that websites must allow the steprep crawler >> in >> > exchange for using the PageSpeed Insights tool? >> > >> > Some people may suggest "why not just ask the company crawling you to >> stop >> > crawling you?" >> > 1 - Some companies ignore the request. >> > 2 - Some companies temporarily stop crawling, then show up again a few >> days >> > or weeks later, at which point I have to waste time dealing with it all >> over >> > again. >> > >> > If we were to allow every crawler to crawl our site, our server would >> be >> > brought to its knees. I'm not going to waste money on increasing >> server >> > resources just so more crawlers can scrape our data. Website owners >> need a >> > mechanism for blocking rogue crawlers, even when they are hosted by >> Google >> > App Engine. >> > >> > -- >> > You received this message because you are subscribed to the Google >> Groups >> > "Google App Engine" group. >> > To view this discussion on the web visit >> > https://groups.google.com/d/msg/google-appengine/-/Bo8u134CRr8J. >> > To post to this group, send email to [email protected]. >> >> > To unsubscribe from this group, send email to >> > [email protected]. >> > For more options, visit this group at >> > http://groups.google.com/group/google-appengine?hl=en. >> > -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/RaQefanPnVMJ. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
