Currently it looks like there is no way to stop indexing of spammed
sites. Link spammers even spam this board automatically from time to
time. That software is very pluggable and can be adapted for any type
of cms and submit forms.
I thought about global dirty solution that could haunt spam during
indexing process. Here is the idea.
Say we have new option for 3.4 + versions:
ExternalLinkCount [maxlinks] [maxpages] [nofollow]
maxlinks is the limit for external links on page. (Spammers are trying
to add direct links for pagerank etc.)
maxpages is the limit for probably spammed pages on same host.
nofollow is true or false. Filter only spam pages with or without
This will delete any page which has more than 20 external links.
ExternalLinkCount 20 20
This will automatically ban and remove site that has more than 20
pages where each page has more than 20 external links.
ExternalLinkCount 20 20 true
This will do previos thing with and without nofollow links.
ExternalLinkCount 20 20 false
Only for direct links that play with pagerank etc.
This is not ideal. It can cut normal pages. But those webmasters who
use nofollow as google recommended are rather safe. This can cut blog
pages with tons of good comments.
Big scientific pages, catalogs and wikis are not probably safe from
such dirty filtering.
Anyway this is probably the simplest way to catch those sites that
have tons of spammed pages. With high limits it could probably help.
Example of site that is currently under spam attack. It generates
thousands of such spammed pages. That is why I thought about this
problem in very basic but cruel way.
General mailing list