Note that I forgot to include the field txt in my post. So, altogether I use the following SQL statement:
SELECT url_id FROM urlwordsNN WHERE txt like '%porn%'; In my WHERE clause I alternated between txt, keywords, description, and title. Where are the words for "index body of HTML documents" stored? What is the field name so I can use it in my WHERE clause? jp Santiago ----- Original Message ----- From: "Kir Kolyshkin" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Friday, January 25, 2002 2:45 AM Subject: Re: [aseek-users] Problems with Spam Search Engine listings > Note that aspseek is a full-text search engine, so it index body of HTML documents. The > results you still found contains %porn% in body. > > > Any ideas? Basically I'd like to know where aspseek derives results from besides title, > > keyword and description, as I've eliminated all sites with %porn% by collecting them into a > > space, and creating a second space with non porn sites. Trouble is, the space with non > > porn sites still has porn sites...but none of those remaining porn sites has %porn% in the > > title, keyword or description. > > > > jp > > Santiago > > > > > > ----- Original Message ----- > > From: John Pinochet > > To: [EMAIL PROTECTED] > > Sent: Monday, January 21, 2002 7:03 PM > > Subject: [aseek-users] Problems with Spam Search Engine listings > > > > I'm having problems getting rid of spam listings. > > > > In particular porn. > > > > I've come up with a list of words and a series of SQL statements to check for > > their occurencs in urlwordsXX, etc etc, but there must be a better way. "-" in > > the query won't do it either as these people are very crafty. Besides, you can't > > have a query with hundreds of 'minused' words. > > > > Why isn't there a very simple way to eliminate sites via a "bad word" list? Note > > I'm not talking about prior to indexing. I'm talking about post index. Adult > > word filter. > > > > Also, even after I've eliminated all traces of %porn%, %Porn%, and %PORN% from > > the database via a comparision query to urlwords00 - urlwords15 (title, > > description, keywords), I still have thousands of websites with %porn%, %PORN%, > > and %Porn%, albeit none of the remaining websites have that in their title, > > description, or keywords, so at least my 'cleaning' is almost working. > > > > Where is this string occuring then if not in title, description, or keywords? > > > > Note that for testing purposes all I did was create two webspaces: one porn free > > (%porn% not found in keywords, description, or title) and the other only porn. > > When I search the porn free space, I STILL have occurences of the above string. > > > > jp > > Santiago > > -- > [EMAIL PROTECTED] ICQ 7551596 Phone +7 903 6722750 > Hard work may not kill you, but why take chances? > -- >
