On Thu, Jul 9, 2009 at 8:20 AM, Palit, Nilanjan<[email protected]> wrote: > > -----Original Message----- > From: Ronald J Kimball > Sent: Thursday, July 09, 2009 10:18 AM > Subject: Re: [Boston.pm] binary search on a list of sorted strings in memory > >> Another option would be a dictionary tree, in which each node is a single >> letter, so each word is represented by a path through the tree. I don't >> know how suitable it would be for this purpose. It is a cool structure >> though. :) > > Back from CS algorithm class, the Boyer-Moore algorithm seems ideally > suited for a fast pattern matching scheme that you are looking for. > Because it searches in reverse order and can quickly find & abandon > target strings that do *not* match, it seems its worth a look.
Boyer-Moore is a great way to speed up looking for a substring anywhere in another string. But when the substring has to be at the start of the other string, Boyer-Moore is *not* as efficient as the naive "compare the start". > Plenty of Google hits to describe it. > > Apparently, Perl's RE optimizer uses this for simple string searches > (http://mkweb.bcgsc.ca/intranet/perlbook/prog/ch24_02.htm). My understanding is that it is great with ASCII, but the transition tables get unwieldy with Unicode. Ben _______________________________________________ Boston-pm mailing list [email protected] http://mail.pm.org/mailman/listinfo/boston-pm

