Note that aspseek is a full-text search engine, so it index body of HTML documents. The results you still found contains %porn% in body.
> Any ideas? Basically I'd like to know where aspseek derives results from besides >title, > keyword and description, as I've eliminated all sites with %porn% by collecting them >into a > space, and creating a second space with non porn sites. Trouble is, the space with >non > porn sites still has porn sites...but none of those remaining porn sites has %porn% >in the > title, keyword or description. > > jp > Santiago > > > ----- Original Message ----- > From: John Pinochet > To: [EMAIL PROTECTED] > Sent: Monday, January 21, 2002 7:03 PM > Subject: [aseek-users] Problems with Spam Search Engine listings > > I'm having problems getting rid of spam listings. > > In particular porn. > > I've come up with a list of words and a series of SQL statements to check for > their occurencs in urlwordsXX, etc etc, but there must be a better way. "-" in > the query won't do it either as these people are very crafty. Besides, you >can't > have a query with hundreds of 'minused' words. > > Why isn't there a very simple way to eliminate sites via a "bad word" list? >Note > I'm not talking about prior to indexing. I'm talking about post index. Adult > word filter. > > Also, even after I've eliminated all traces of %porn%, %Porn%, and %PORN% from > the database via a comparision query to urlwords00 - urlwords15 (title, > description, keywords), I still have thousands of websites with %porn%, %PORN%, > and %Porn%, albeit none of the remaining websites have that in their title, > description, or keywords, so at least my 'cleaning' is almost working. > > Where is this string occuring then if not in title, description, or keywords? > > Note that for testing purposes all I did was create two webspaces: one porn >free > (%porn% not found in keywords, description, or title) and the other only porn. > When I search the porn free space, I STILL have occurences of the above string. > > jp > Santiago -- [EMAIL PROTECTED] ICQ 7551596 Phone +7 903 6722750 Hard work may not kill you, but why take chances? --
