On Thu, Jan 15, 2004 at 02:52:30PM -0500, Brent Baisley wrote: > It sounds like you are trying to do full text searching, but you > implemented it "manually". Was MySQL's full text indexing not > sufficient for your needs or am I totally missing what you are trying > to do?
You're absolutely right: it's full text searching. While the MySQL functionality for this is quite nice, I'm trying to do a lot more than is available with MySQL currently. So, yes: MySQL full text indexing is not sufficient (at least, as I understand it). More background: Fundamentally, MySQL offers only one major method for doing text retrieval (with some configurability, of course). My application is an experimental platform for different fundamental approaches to information retrieval. So, I need to build very different functionality than MySQL has for full text searching, on top of the core ability to identify documents with particular terms. (Some of the fundamental approaches are: variations on Boolean retrieval, the Vector Space Model, Latent Semantic Indexing, and Probabilistic Information Retrieval.) In the lingo of IR, what I'm trying to build with MySQL is the "inverted index." This is where the terms are the keys, and the data items are the documents in which those terms occur. I've expanded my data items to also include which paragraph, which HTML tag, which term sequence, and the weight of the term in the document. Among other things, this enables sub-document retrieval (i.e., paragraph-level), adjacency searching, phrase searching, and more. It lets me experiment with very different term and document weighting methods, too. If you're really a glutton for punishment, look at the current source tree via CVS at http://sourceforge.net/projects/irtools -- Greg > On Jan 15, 2004, at 1:53 PM, Gregory Newby wrote: > > >I'm using MySQL for an information retrieval application where word > >occurrences are indexed. It seems that performance is not as good as > >I would expect (it seems nearly linear with the number of rows). > >Any advice would be welcome. I'll lay out a lot of detail. > > > -- > Brent Baisley > Systems Architect > Landover Associates, Inc. > Search & Advisory Services for Advanced Technology Environments > p: 212.759.6400/800.759.0577 > -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]