Hi Nick - and by extension, Barry as well (unfortunately I appear to have 
sent my reply directly to him - my apologies as I didn't CC myself so I 
can't share what exactly I wrote!)

First of all, rest assured that my concerns are not necessarily with Google 
App Engine, but rather the species of search engine related API development 
frameworks that rely on that particular address space, perhaps more commonly 
referred to as cloud leveraged app platforms.

The problem is that search engines - such as Google's - are routinely 
polluted; that is not attributable to negligence but it's the same sad 
reality nonetheless. Such polluted entries (e.g. certain queries) are used 
as a vector tampering with other, external properties. No amount of 
"sanitization" can counter the fundamental lack of a "permissible URL 
tokenizing" framework, i.e. something which communicates in a uniform manner 
to all interested parties (i.e. the Google family) what a "permissible" URL 
looks like.

Sadly, the robots.txt syntax and the meta tag nofollow,noindex both lack 
this "syntax whitelisting" feature; they are not prescriptive ("only crawl 
and index the URLs that look like this, and ignore the rest"). Of course, 
with many if not most standard on-site search queries, it is possible to 
script page headers that include nofollow,noindex metatags. But many other 
kinds of dynamic content aren't easily "wrapped" with such headers.

And that is where abuse of poisoned search engine indexes come into play.

Just as I can't hunt down every non-canonical URL in the Google index, 
flagging issues case-by-case is not only not effective (if only because my 
logs demonstrate that) but practically prohibitive as well (I assume you can 
imagine that I'm not interested in hunting down all search engine based 
botnet traffic and relating that to individual sources) so my alternative is 
to simply shut down access to search engines. I don't have the time or the 
resources to play whack a mole with the ever increasing scourge of botnets. 
Incidentally, a look at traffic evolution in my traffic logs and a cursory 
look at some well-known email spam statistics suggests that indeed there's a 
quantum shift afoot, shifting from email to (particularly) smaller web 
property targeting for invasive "advertising" methods by the miscreants out 
there.

And that is exactly what I have chosen to do: the well-behaved search 
engines (Google, Bing, Yahoo) are informed via robots.txt that they are not 
welcome, and their indexes are cleared out; the ill-behaved ones are blocked 
and upon sight rigorously reported to blacklists.

Until there is something available which gives website proprietors 
(especially the small to medium sized ones!) a trivial and effective means 
to control which content is accessible for storage and further processing in 
the cloud, the internet will continue to shrink.

Indeed, with heavy heart. But I don't have the resources to keep my 
web-based property open to "play nice" with worthwhile endeavors such as 
Google App Engine, while a notorious minority of criminals (I openly prefer 
the "terrorist" moniker) runs amok with virtual impunity. And so, I set a 
tight regime for wrapper security scripts (e.g. ZB Block, which I find quite 
effective and flexible).

Hopefully you now understand better; it's not that I mistrust Google, or 
Google App Engine in particular. I just can't afford to be available for 
well-intended fun and games while carrying the weight of incidental abuse at 
my own expense.

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Reply via email to