I wrote:

>> Then you can grab the matching businesses by joining against this
>> 
>> table:
>> | SELECT bi.*
>> |   FROM bus_info bi, bus_search_words bsw
>> |   WHERE bsw.bus_id = bi.bus_id
>> |     AND bsw.search_word = 'target'
>> 
>> Variations on this (probably adding DISTINCT or GROUP BY with some
>> sort of counting option, etc) are probably necessary.  If you're
>> searching for multiple whole words, I'd suggest an "IN" clause.
>> 
>> | my $word_list = join ', ', map $dbh->quote($_), @words;
>> | my $sql = ( "SELECT bus_id" .
>> |             "  FROM bus_search_words" .
>> |             "  WHERE search_word IN ( $word_list )" );
>> | my $ids_aref = $dbh->selectcol_arrayref($sql);
>> 

>>>>> "Shea" == Shea Nangle <[EMAIL PROTECTED]> writes:

Shea> Is there any way to do one of the above queries if you want only
Shea> the business names that match all of the search words ? (Without
Shea> using nested selects or post-fetch processing in perl ?)  If I
Shea> understand correctly, the query above that uses the "IN" clause
Shea> returns all documents that match any of the search words...

Well, using my original DDL for bus_search_words:

| CREATE TABLE bus_search_words
| (
|     bus_id      INTEGER REFERENCES bus_info ( bus_id ),
|     search_word VARCHAR(20),
|     INDEX ( search_word )
| )

I'd just count the number of times that each bus_id shows up in the
output from my query above.  If you want to match ANY word, you only
care that it shows up at least 1 time; if you want to match ALL the
words, the count must equal the number of words in the query.  You can
do this in the perl script if you like, or perhaps using a HAVING
clause:

| SELECT bus_id, COUNT(*)
|   FROM bus_search_words
|   WHERE search_word IN ( 'foo', 'bar', 'baz' )
|   GROUP BY bus_id
|   HAVING COUNT(*) = 3

I've done things like this, and it works well.  (Note that I'm not
saying this particular table layout is necessarily a good way to do a
search engine; it feels fundamentally correct, but I would expect that
a full RDBMS is far too general to work optimally in the very narrow
ways we're using it -- e.g. a DBM or some other custom format
(particularly tries) might be more appropriate.  Benchmark.  :)

t.

Reply via email to