On Thu, Jan 15, 2004 at 02:52:30PM -0500, Brent Baisley wrote:
> It sounds like you are trying to do full text searching, but you 
> implemented it "manually". Was MySQL's full text indexing not 
> sufficient for your needs or am I totally missing what you are trying 
> to do?

You're absolutely right: it's full text searching.

While the MySQL functionality for this is quite nice, I'm
trying to do a lot more than is available with MySQL currently.
So, yes: MySQL full text indexing is not sufficient (at least,
as I understand it).

More background:

Fundamentally, MySQL offers only one major method for
doing text retrieval (with some configurability, of course).

My application is an experimental platform for different
fundamental approaches to information retrieval.  So,
I need to build very different functionality than MySQL has
for full text searching, on top of the core ability to
identify documents with particular terms.  (Some of the
fundamental approaches are: variations on Boolean retrieval,
the Vector Space Model, Latent Semantic Indexing, and
Probabilistic Information Retrieval.)

In the lingo of IR, what I'm trying to build with MySQL
is the "inverted index."  This is where the terms are the keys,
and the data items are the documents in which those terms occur.
I've expanded my data items to also include which paragraph,
which HTML tag, which term sequence, and the weight of the term
in the document.  Among other things, this enables sub-document
retrieval (i.e., paragraph-level), adjacency searching,
phrase searching, and more.  It lets me experiment with
very different term and document weighting methods, too. 

If you're really a glutton for punishment, look at the
current source tree via CVS at http://sourceforge.net/projects/irtools
  -- Greg

> On Jan 15, 2004, at 1:53 PM, Gregory Newby wrote:
> 
> >I'm using MySQL for an information retrieval application where word
> >occurrences are indexed.  It seems that performance is not as good as
> >I would expect (it seems nearly linear with the number of rows).
> >Any advice would be welcome.  I'll lay out a lot of detail.
> >
> -- 
> Brent Baisley
> Systems Architect
> Landover Associates, Inc.
> Search & Advisory Services for Advanced Technology Environments
> p: 212.759.6400/800.759.0577
> 

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to