On 11/12/2013 15:59, b...@mnogosearch.org wrote:
Author: fasfuuiios Email: Message: Currently it looks like there is no way to stop indexing of spammed sites. Link spammers even spam this board automatically from time to time. That software is very pluggable and can be adapted for any type of cms and submit forms.
One approach would be to build a table of problem/spam links. Then if a site has any of these toxic links, either drop the website or add the toxic link to a regexp.
I thought about global dirty solution that could haunt spam during indexing process. Here is the idea. --------- Say we have new option for 3.4 + versions: ExternalLinkCount [maxlinks] [maxpages] [nofollow] maxlinks is the limit for external links on page. (Spammers are trying to add direct links for pagerank etc.)
From work I do every month in TLD web usage surveys (measuring how websites are used in TLDs and the percentages of active/holding page/PPC/redirects), link spam is either comment form or cracked Joomla/Wordpress spam links. Comment spam can be be blocked with a regexp. The injection link spam may be invisible to ordinary browsers but visible to search engines due to CSS rules. These sites often use an old version of Joomla or Wordpress or a vulnerable plug-in. But they do not have many outbound toxic links. And these toxic links generally change each month.
This will delete any page which has more than 20 external links.
Most news sites will have more than this as they will have a network of their own sites, analytics, advertising, social media and other links. Web directories still exist and they typically have more than 20 outbound links per page.
(Will post more later.) Regards...jmcc _______________________________________________ General mailing list General@mnogosearch.org http://lists.mnogosearch.org/listinfo/general