Note that I forgot to include the field txt in my post.  So, altogether I
use the following SQL statement:

SELECT url_id
FROM urlwordsNN
WHERE txt like '%porn%';

In my WHERE clause I alternated between txt, keywords, description, and
title.  Where are the words for "index body of HTML documents" stored?  What
is the field name so I can use it in my WHERE clause?

jp
Santiago





----- Original Message -----
From: "Kir Kolyshkin" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Friday, January 25, 2002 2:45 AM
Subject: Re: [aseek-users] Problems with Spam Search Engine listings


> Note that aspseek is a full-text search engine, so it index body of HTML
documents. The
> results you still found contains %porn% in body.
>
> > Any ideas?  Basically I'd like to know where aspseek derives results
from besides title,
> > keyword and description, as I've eliminated all sites with %porn% by
collecting them into a
> > space, and creating a second space with non porn sites.  Trouble is, the
space with non
> > porn sites still has porn sites...but none of those remaining porn sites
has %porn% in the
> > title, keyword or description.
> >
> > jp
> > Santiago
> >
> >
> >      ----- Original Message -----
> >      From: John Pinochet
> >      To: [EMAIL PROTECTED]
> >      Sent: Monday, January 21, 2002 7:03 PM
> >      Subject: [aseek-users] Problems with Spam Search Engine listings
> >
> >      I'm having problems getting rid of spam listings.
> >
> >      In particular porn.
> >
> >      I've come up with a list of words and a series of SQL statements to
check for
> >      their occurencs in urlwordsXX, etc etc, but there must be a better
way.  "-" in
> >      the query won't do it either as these people are very crafty.
Besides, you can't
> >      have a query with hundreds of 'minused' words.
> >
> >      Why isn't there a very simple way to eliminate sites via a "bad
word" list?  Note
> >      I'm not talking about prior to indexing.  I'm talking about post
index.  Adult
> >      word filter.
> >
> >      Also, even after I've eliminated all traces of %porn%, %Porn%, and
%PORN% from
> >      the database via a comparision query to urlwords00 - urlwords15
(title,
> >      description, keywords), I still have thousands of websites with
%porn%, %PORN%,
> >      and %Porn%, albeit none of the remaining websites have that in
their title,
> >      description, or keywords, so at least my 'cleaning' is almost
working.
> >
> >      Where is this string occuring then if not in title, description, or
keywords?
> >
> >      Note that for testing purposes all I did was create two webspaces:
one porn free
> >      (%porn% not found in keywords, description, or title) and the other
only porn.
> >      When I search the porn free space, I STILL have occurences of the
above string.
> >
> >      jp
> >      Santiago
>
> --
> [EMAIL PROTECTED]  ICQ 7551596  Phone +7 903 6722750
> Hard work may not kill you,  but why take chances?
> --
>

Reply via email to