Thanks Peter:

----- Original Message ----- 
From: "Peter L. Berghold" <[EMAIL PROTECTED]>
To: "Brian" <[EMAIL PROTECTED]>
Cc: "MySQL" <[EMAIL PROTECTED]>
Sent: Tuesday, March 25, 2003 4:07 PM
Subject: Re: Your professional opinion Please...


> On Tue, 2003-03-25 at 18:11, Brian wrote:
> > What mechanism do you recommend?
> > Something in perl, python or php?

> Well... I tend to be a Perl bigot so I'd choose Perl. I would 
> do a couple of things. 

8^)

> 1) I'd develop a list of words to ignore such as "and", "if" ,
> "but" etc. etc..  This may take time and iterations. 

> 2) Read each file in and split on word boundaries and tally 
> the words that are not in the exclusion list and theoretically 
> what is left will be keywords. 

> 3) Use the number of times that a keyword is found in each 
> flat text file as a "weight" to be used later as a scoring mech-
> anism for the search to determine relevance. 

> 4) Write all this to a table. Once all the documents are scanned 
> THEN build your index. 

> > Are their prebuilt modules that would develop such an index?
 
> I don't know for sure, check CPAN (www.cpan.org) and see. 
> There may well be as I'm sure someone else has had to do this 
> before. 

I will check CPAN for binary tolerant text search engines.

Thanks for your thoughts.

Best regards,

Brian



-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to