What version of MySQL are you using?  Version 4, which is labeled alpha, but
is quite stable, includes new full-text capabilities.

In v3.23.23 (and later), you can create FULLTEXT indexes on VARCHAR and TEXT
columns, then search with the MATCH operator, which will be far, far faster
than LIKE -- and returns a relevancy score.

More here:  http://www.mysql.com/doc/en/Fulltext_Search.html

In 4.0.1 and later, you can also do Boolean searches.

Here's a description of what's planned for v4 in terms of full-text search:

The new FULLTEXT search properties of MySQL 4.0 enables the use of FULLTEXT
indexing of large text masses with both binary and natural language
searching logic. Users can customise minimal word length and define their
own stop word lists in any human language, enabling a new set of
applications to be built on MySQL.

Although I've done a lot of what you've described, I actually haven't yet
made much use of the full-text capabilities so far, so I can't offer much
first-hand knowledge.  But if you've already gotten things working with
MySQL, it would make sense to use its capabilities before trying something
else.

--
Nick Arnett
Phone/fax: (408) 904-7198
[EMAIL PROTECTED]


> -----Original Message-----
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On
> Behalf Of Matthias Jaekle
> Sent: Tuesday, August 27, 2002 7:03 AM
> To: [EMAIL PROTECTED]
> Subject: [Robots] Full text indexing ?
>
>
> Hello,
>
> I would like to index many files using a mysql database. Before a file
> should be indexed I would like to check the content with a perl script,
> which also decides if the file is worth to be indexed.
>
> Before a file will be downloaded I would like to check with a perl code
> if the link seams to be interesting and make a decision according to the
> link name.
>
> The Web Interface to access the crawlers database should be in PHP. The
> layout I would like to make like google. If it is possible to implement
> own ranking modules it would be fine.
>
> Currently I use LWP::RobotUA and write the interesting files completely
> in a mysql database and search the database with LIKE '%$WORD%' which is
> very slow.
>
> Now there are multiple systems available:
>   DBIx::KwIndex, DBIx::Fulltext, DBIx::TextIndex
>   Glimpse, htdig, SIWSH-E, Isearch, WordIndex
>
> Could somebody recommend on of this systems for this needs.
>
> Many thanks
>
> Matthias Jaekle
>
>
>
>
>
>
> _______________________________________________
> Robots mailing list
> [EMAIL PROTECTED]
> http://www.mccmedia.com/mailman/listinfo/robots

_______________________________________________
Robots mailing list
[EMAIL PROTECTED]
http://www.mccmedia.com/mailman/listinfo/robots

Reply via email to