I tried to explain in the O.P. It is search based on matching the string
prefix, and so is not "exact" search. I have a large set of target strings,
about 250,000. I have a set of search strings, about 7,500. Assume both
arrays are sorted. In pseudo-code (pseudo-perl):
S: for each $s in @search
T: for each $t in @target
if $t lt $s next T; # I can optimize this away by remembering where
to start
if $t !~ /^$s/ next S; # we have gone too far -- no chance of a match
anymore
if $t matches /^$s./ return $t # 3. my actual match logic code here is
more complex
Because it is prefix search having the arrays sorted is a big win. I may
use linear look-ahead rather than binary search, because the expected number
of matches at #3 will be small; I conjecture around 20.
Steve
N.B. I am replying to BPM even though this was sent just to me by Bill, to
help clarify what I am trying to do. Thanks to Ben Tilly and Uri Guttman
for their suggestions. More feedback later.
-----Original Message-----
From: Bill Ricker [mailto:[email protected]]
Sent: Thursday, July 09, 2009 8:09 AM
To: Steve Tolkin
Subject: Re: [Boston.pm] binary search on a list of sorted strings in memory
On Wed, Jul 8, 2009 at 8:33 PM, Steve Tolkin<[email protected]> wrote:
> I want to do a binary search on a list of strings in memory.
Why?
Binary Search is for something where record access is expensive so we
have to minimize number of accesses, and the file is maintained by MFU
so is always sorted 'for free'. Otherwise the sort is the expensive
operation. Unless your 'memory' is swapped, or this is a real time
issue (RAM=>CACHE is now 'expensive'), I'd use more sugary data
structure like hashing, if exact match is expected.
Do you really need to embrace the "cache is the new RAM, RAM is the
new Disk" meme?
If so, A assume 'list' means @ArrayOfString[]. The list was sorted
when you read it in? guaranteed sorted according to the 'lt' op you'll
use in bin search, and not human friendly Mc eq MC eq Mac eq Mac\x20 ?
--
Bill
[email protected] [email protected]
_______________________________________________
Boston-pm mailing list
[email protected]
http://mail.pm.org/mailman/listinfo/boston-pm