Any ideas?  Basically I'd like to know where aspseek derives results from besides title, keyword and description, as I've eliminated all sites with %porn% by collecting them into a space, and creating a second space with non porn sites.  Trouble is, the space with non porn sites still has porn sites...but none of those remaining porn sites has %porn% in the title, keyword or description.
 
jp
Santiago
 
----- Original Message -----
Sent: Monday, January 21, 2002 7:03 PM
Subject: [aseek-users] Problems with Spam Search Engine listings

I'm having problems getting rid of spam listings.
 
In particular porn.
 
I've come up with a list of words and a series of SQL statements to check for their occurencs in urlwordsXX, etc etc, but there must be a better way.  "-" in the query won't do it either as these people are very crafty.  Besides, you can't have a query with hundreds of 'minused' words.
 
Why isn't there a very simple way to eliminate sites via a "bad word" list?  Note I'm not talking about prior to indexing.  I'm talking about post index.  Adult word filter.
 
Also, even after I've eliminated all traces of %porn%, %Porn%, and %PORN% from the database via a comparision query to urlwords00 - urlwords15 (title, description, keywords), I still have thousands of websites with %porn%, %PORN%, and %Porn%, albeit none of the remaining websites have that in their title, description, or keywords, so at least my 'cleaning' is almost working.
 
Where is this string occuring then if not in title, description, or keywords?
 
Note that for testing purposes all I did was create two webspaces:  one porn free (%porn% not found in keywords, description, or title) and the other only porn.  When I search the porn free space, I STILL have occurences of the above string.
 
jp
Santiago

Reply via email to