>>>>> "KS" == Kripa Sundar <[email protected]> writes:
KS> I have a 900 Meg text file, containing random text. I also have a list KS> of 6000 names (alphanumeric strings) that occur in the random text. KS> I need to tag a prefix on to each occurrence of each of these 6000 KS> names. KS> My premise: KS> I believe a regex would give the simplest and most efficient algorithm. KS> If I am mistaken, I would be happy to learn. KS> 2: my $regex = join "|", @names; that will kill your cpu. alternations are very slow since they have to go back and try from the beginning of the list each time. one trick would be to find a way to grab the names in a generic way and check to see if they match one of the names in a hash. without data it would be hard to show this in detail. but i will assume each name is 2-3 'words' in the text. the idea is to loop over the text's words and grab the next 2-3 (a simple shift register using an array works for this). push in new words and shift out old one in a loop. then take that list of words (you could grab first 2 and then all 3 to get most name combos) and look them up in the hash of names. if found, edit the file in place and continue. you could read large blocks of text from the file in an outer loop and keep a running buffer. this is how i do it in File::ReadBackwards to get lines without knowing the boundaries in advance. so this technique would only scan the file one time and use a fast hash for lookups. it could actually run in minutes or less if done correctly. uri -- Uri Guttman ------ [email protected] -------- http://www.sysarch.com -- ----- Perl Code Review , Architecture, Development, Training, Support ------ --------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com --------- _______________________________________________ Boston-pm mailing list [email protected] http://mail.pm.org/mailman/listinfo/boston-pm

