Is Christian correct?
--
David Adams
Computing Services
Southampton University
----- Original Message -----
From: "ryc" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>; "Christian Jaeger"
<[EMAIL PROTECTED]>
Sent: Thursday, September 13, 2001 5:14 PM
Subject: Re: Fulltext indexing libraries (perl/C/C++)
> I think what you are looking for is called mifluz and is the indexing
> library that htdig uses. The link is http://www.gnu.org/software/mifluz/ .
>
> If you develop any kind of bindings to use mifluz to index a mysql
database
> let me know I would definitly be interested.
>
> ryan
>
> ----- Original Message -----
> From: "Christian Jaeger" <[EMAIL PROTECTED]>
> To: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
> Sent: Wednesday, September 12, 2001 9:42 PM
> Subject: Fulltext indexing libraries (perl/C/C++)
>
>
> > Hello
> >
> > [ I'm crossposting this to dbi-users because it might be of interest
> > there too. Maybe better don't reply to both lists, thanks. ]
> >
> > While programming a journal in perl/axkit I realize that the problems
> > of both creating useful indexes for searching content efficiently and
> > parse user input and create the right sql queries from it are sooo
> > common that there *must* be some good library already. :-) So I
> > headed over to CPAN, but didn't really find what I was looking for.
> >
> > It should create indexes that are efficiently searchable in mysql,
> > i.e. only <select ... where .. like "abcd%"> queries, not "%abc%".
> > Allow to search for word parts (i.e. find "fulltext" when entering
> > "text"). Allow for multiple form fields (i.e. one field for title
> > words, one for author names, etc.) at once. Preferably allow for some
> > sort of query rules (AND/NOT/OR or something).
> > Preferably do some relevance sorting. Preferably allow to hook some
> > numbers (link or access counts etc) into the relevance sorting.
> >
> > I think there are 3 tough parts which are needed:
> > 1. creation of sophisticated index structures (inverted indexes)
> > 2. somehow recognize sub-word boundaries to split words on. Maybe use
> > some form of thesaurus? Or syllables? (I suspect it should be the
> > same rules as for splitting words on line boundaries)
> > 3. user input parser / query creator
> >
> > Why not:
> >
> > - use mysql's fulltext indexes? Because I think that currently they
> > are too limited (i.e. see user comments about them
> > www.mysql.com/doc/) (should be better in mysql-4, I read, but we need
> > it in a few weeks already...). And they are also not supported in
> > Innodb which we want to use.
> >
> > - use indexing robots? Because we work with XML documents, and would
> > like to both keep the index up to date immediately, as well as split
> > the XML contents into several parts (i.e. there's a title, byline,
> > etcetc, which should be searchable or weigted differently). We want a
> > *library*, not a finished product.
> >
> > There's Lucene (www.lucene.com) in Java that I think does exactly
> > what I want. Anyone who helps me port that to perl or
> > C(++)/perl-bindings (-; ? (It should be ready in a few weeks, and
> > it's about 500k source code :-().
> >
> > (Something in C/C++ that would be loaded as UDF or so would be nice
> > too, but as I understand (from recent discussion about embedded
> > procedural languages) it's not possible since these UDF's would have
> > to start other queries (i.e. to insert each word fragment into an
> > index table).)
> >
> > What are my current options? What do you use?
> > More info about mysql-4?
> >
> > Thx
> > Christian.
> >
> > ---------------------------------------------------------------------
> > Before posting, please check:
> > http://www.mysql.com/manual.php (the manual)
> > http://lists.mysql.com/ (the list archive)
> >
> > To request this thread, e-mail <[EMAIL PROTECTED]>
> > To unsubscribe, e-mail
<[EMAIL PROTECTED]>
> > Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
> >
>
>
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html