Note that aspseek is a full-text search engine, so it index body of HTML documents. The
results you still found contains %porn% in body.

> Any ideas?  Basically I'd like to know where aspseek derives results from besides 
>title,
> keyword and description, as I've eliminated all sites with %porn% by collecting them 
>into a
> space, and creating a second space with non porn sites.  Trouble is, the space with 
>non
> porn sites still has porn sites...but none of those remaining porn sites has %porn% 
>in the
> title, keyword or description.
>  
> jp
> Santiago
>  
> 
>      ----- Original Message ----- 
>      From: John Pinochet 
>      To: [EMAIL PROTECTED] 
>      Sent: Monday, January 21, 2002 7:03 PM
>      Subject: [aseek-users] Problems with Spam Search Engine listings
> 
>      I'm having problems getting rid of spam listings.
>       
>      In particular porn.
>       
>      I've come up with a list of words and a series of SQL statements to check for
>      their occurencs in urlwordsXX, etc etc, but there must be a better way.  "-" in
>      the query won't do it either as these people are very crafty.  Besides, you 
>can't
>      have a query with hundreds of 'minused' words.
>       
>      Why isn't there a very simple way to eliminate sites via a "bad word" list?  
>Note
>      I'm not talking about prior to indexing.  I'm talking about post index.  Adult
>      word filter.
>       
>      Also, even after I've eliminated all traces of %porn%, %Porn%, and %PORN% from
>      the database via a comparision query to urlwords00 - urlwords15 (title,
>      description, keywords), I still have thousands of websites with %porn%, %PORN%,
>      and %Porn%, albeit none of the remaining websites have that in their title,
>      description, or keywords, so at least my 'cleaning' is almost working.
>       
>      Where is this string occuring then if not in title, description, or keywords?
>       
>      Note that for testing purposes all I did was create two webspaces:  one porn 
>free
>      (%porn% not found in keywords, description, or title) and the other only porn. 
>      When I search the porn free space, I STILL have occurences of the above string.
>       
>      jp
>      Santiago

-- 
[EMAIL PROTECTED]  ICQ 7551596  Phone +7 903 6722750
Hard work may not kill you,  but why take chances?
--

Reply via email to