Re: [Boston.pm] binary search on a list of sorted strings in memory

Ronald J Kimball Thu, 09 Jul 2009 07:18:07 -0700

On Wed, Jul 08, 2009 at 08:33:34PM -0400, Steve Tolkin wrote:

> I want to search a sorted array to see if which strings, if any,  have the
> same prefix as my search string.  For example, given the search string of
> "tempest" I want to find "tempests" and "tempestuous".   My word list has
> about 250,000 words.  Even though only about 7500 will be used to start as
> search I think brute force would be too slow.  It would take  O (n*n) time.
>
> Because I believe in measuring performance I actually wrote the brute force
> code.  It only uses 2500 strings to strart the search, rather than the 7500
> I plan to use.  In fact it is very slow.   It is still running 15 minutes
> later, and I can tell it is only half done.


I'm very surprised it is so slow.  I wrote code to test the brute force
approach and it searches ~250,000 words for 2500 strings in just 3 minutes.


> Alternatively, can someone recommend  another perl module for prefix
> searching on a moderately large set of strings, e.g. trie  or suffix tree or
> suffix array, etc..  It must be reliable and easy to use.

Another option would be a dictionary tree, in which each node is a single
letter, so each word is represented by a path through the tree.  I don't
know how suitable it would be for this purpose.  It is a cool structure
though.  :)

Ronald

_______________________________________________
Boston-pm mailing list
[email protected]
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] binary search on a list of sorted strings in memory

Reply via email to