Hi Nick - and by extension, Barry as well (unfortunately I appear to have
sent my reply directly to him - my apologies as I didn't CC myself so I
can't share what exactly I wrote!)
First of all, rest assured that my concerns are not necessarily with Google
App Engine, but rather the species of search engine related API development
frameworks that rely on that particular address space, perhaps more commonly
referred to as cloud leveraged app platforms.
The problem is that search engines - such as Google's - are routinely
polluted; that is not attributable to negligence but it's the same sad
reality nonetheless. Such polluted entries (e.g. certain queries) are used
as a vector tampering with other, external properties. No amount of
"sanitization" can counter the fundamental lack of a "permissible URL
tokenizing" framework, i.e. something which communicates in a uniform manner
to all interested parties (i.e. the Google family) what a "permissible" URL
looks like.
Sadly, the robots.txt syntax and the meta tag nofollow,noindex both lack
this "syntax whitelisting" feature; they are not prescriptive ("only crawl
and index the URLs that look like this, and ignore the rest"). Of course,
with many if not most standard on-site search queries, it is possible to
script page headers that include nofollow,noindex metatags. But many other
kinds of dynamic content aren't easily "wrapped" with such headers.
And that is where abuse of poisoned search engine indexes come into play.
Just as I can't hunt down every non-canonical URL in the Google index,
flagging issues case-by-case is not only not effective (if only because my
logs demonstrate that) but practically prohibitive as well (I assume you can
imagine that I'm not interested in hunting down all search engine based
botnet traffic and relating that to individual sources) so my alternative is
to simply shut down access to search engines. I don't have the time or the
resources to play whack a mole with the ever increasing scourge of botnets.
Incidentally, a look at traffic evolution in my traffic logs and a cursory
look at some well-known email spam statistics suggests that indeed there's a
quantum shift afoot, shifting from email to (particularly) smaller web
property targeting for invasive "advertising" methods by the miscreants out
there.
And that is exactly what I have chosen to do: the well-behaved search
engines (Google, Bing, Yahoo) are informed via robots.txt that they are not
welcome, and their indexes are cleared out; the ill-behaved ones are blocked
and upon sight rigorously reported to blacklists.
Until there is something available which gives website proprietors
(especially the small to medium sized ones!) a trivial and effective means
to control which content is accessible for storage and further processing in
the cloud, the internet will continue to shrink.
Indeed, with heavy heart. But I don't have the resources to keep my
web-based property open to "play nice" with worthwhile endeavors such as
Google App Engine, while a notorious minority of criminals (I openly prefer
the "terrorist" moniker) runs amok with virtual impunity. And so, I set a
tight regime for wrapper security scripts (e.g. ZB Block, which I find quite
effective and flexible).
Hopefully you now understand better; it's not that I mistrust Google, or
Google App Engine in particular. I just can't afford to be available for
well-intended fun and games while carrying the weight of incidental abuse at
my own expense.
--
You received this message because you are subscribed to the Google Groups
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/google-appengine?hl=en.