On Wed, 22 Jan 2003 12:22:09 +0200 Eli Marmor <[EMAIL PROTECTED]> wrote: > By the way: What is the list's recommendation for a database for a > dictionary? (i.e. zillion records; English words are the keys; 99% of > the activity is search and read and almost no update activity; When > matching texts against the DB, the length is not known, and the longest > key (that is matching) is taken; The time to find a match is critical).
Since you search mostly against one index (english words), looks like any SQL is an overkill. I would go for BerkleyDB and it's ilk. > And a similar question: If I have a collection of hundreds (simple) > regular expressions, and want to find all the matches of them in a long > free text, is there any Open Source library for this purpose? (like > flex, but without generating C code + compilation to machine code; Just > a function library). There's libpcre (Perl Compatible Regular Expressions). Haven't checked its limitations (if there's any but VM) on regex or text lengh. I assume of course that what you look for is matching one or more *alternatives* so it is equivalent to one huge regex (re1|re2|re3|...). Of course the automata may explode in space -- you'll have to check it yourself. > (Currently I plan to build a state-machine for this purpose, but still > looking for an existing library, as efficient as possible). You'll have the same problem -- depends on the type of automata you build. My guess is that a deterministic one would explode in memory space with hundreds of regexs in parallel. Hope it helps, ---------------------------------------------------------------- Oron Peled Voice/Fax: +972-4-8228492 [EMAIL PROTECTED] http://www.actcom.co.il/~oron Linux lasts longer! -- "Kim J. Brand" <[EMAIL PROTECTED]> ================================================================= To unsubscribe, send mail to [EMAIL PROTECTED] with the word "unsubscribe" in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]
