On Wed, 22 Jan 2003 12:22:09 +0200
Eli Marmor <[EMAIL PROTECTED]> wrote:
> By the way: What is the list's recommendation for a database for a
> dictionary?  (i.e. zillion records; English words are the keys; 99% of
> the activity is search and read and almost no update activity; When
> matching texts against the DB, the length is not known, and the longest
> key (that is matching) is taken; The time to find a match is critical).

Since you search mostly against one index (english words), looks like
any SQL is an overkill. I would go for BerkleyDB and it's ilk.
 
> And a similar question: If I have a collection of hundreds (simple)
> regular expressions, and want to find all the matches of them in a long
> free text, is there any Open Source library for this purpose?  (like
> flex, but without generating C code + compilation to machine code; Just
> a function library).

There's libpcre (Perl Compatible Regular Expressions). Haven't checked its
limitations (if there's any but VM) on regex or text lengh. I assume of course
that what you look for is matching one or more *alternatives* so it is
equivalent to one huge regex (re1|re2|re3|...). Of course the automata
may explode in space -- you'll have to check it yourself.

> (Currently I plan to build a state-machine for this purpose, but still
> looking for an existing library, as efficient as possible).

You'll have the same problem -- depends on the type of automata you build.
My guess is that a deterministic one would explode in memory space with
hundreds of regexs in parallel.

Hope it helps,

----------------------------------------------------------------
Oron Peled                             Voice/Fax: +972-4-8228492
[EMAIL PROTECTED]                  http://www.actcom.co.il/~oron

Linux lasts longer!
                        -- "Kim J. Brand" <[EMAIL PROTECTED]>

=================================================================
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word "unsubscribe" in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]

Reply via email to