Thanks Peter: ----- Original Message ----- From: "Peter L. Berghold" <[EMAIL PROTECTED]> To: "Brian" <[EMAIL PROTECTED]> Cc: "MySQL" <[EMAIL PROTECTED]> Sent: Tuesday, March 25, 2003 4:07 PM Subject: Re: Your professional opinion Please...
> On Tue, 2003-03-25 at 18:11, Brian wrote: > > What mechanism do you recommend? > > Something in perl, python or php? > Well... I tend to be a Perl bigot so I'd choose Perl. I would > do a couple of things. 8^) > 1) I'd develop a list of words to ignore such as "and", "if" , > "but" etc. etc.. This may take time and iterations. > 2) Read each file in and split on word boundaries and tally > the words that are not in the exclusion list and theoretically > what is left will be keywords. > 3) Use the number of times that a keyword is found in each > flat text file as a "weight" to be used later as a scoring mech- > anism for the search to determine relevance. > 4) Write all this to a table. Once all the documents are scanned > THEN build your index. > > Are their prebuilt modules that would develop such an index? > I don't know for sure, check CPAN (www.cpan.org) and see. > There may well be as I'm sure someone else has had to do this > before. I will check CPAN for binary tolerant text search engines. Thanks for your thoughts. Best regards, Brian -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]