subject:"\[Boston.pm\] Q\: giant\-but\-simple regex efficiency"

Re: [Boston.pm] Q: giant-but-simple regex efficiency

2011-02-06 Thread belg4mit

You assumed that \s will delimit the tokens. That's not the case (see the original message, the interesting data can occur anywhere). So you can't tokenize and do a simple hash lookup. If you benchmark 6000 Acutally I believe the OP said that there were still delimters required, they just

Re: [Boston.pm] Q: giant-but-simple regex efficiency

2011-02-06 Thread Charlie

Given how you frame the problem, then the hash lookup isn't even an option! No question, 6000+ string searches will be slow vs. a trie. Given the varying requirements we all encounter, day-to-day, I think this is an interesting exercise. Thanks for sharing these modules, Ted. The OP

Re: [Boston.pm] Q: giant-but-simple regex efficiency

2011-02-06 Thread Ted Zlatanov

On Sun, 06 Feb 2011 11:49:56 -0500 Charlie creit...@rcn.com wrote: C Given how you frame the problem, then the hash lookup isn't even an C option! No question, 6000+ string searches will be slow vs. a trie. C Given the varying requirements we all encounter, day-to-day, I think C this is an

Re: [Boston.pm] Q: giant-but-simple regex efficiency

2011-02-06 Thread belg4mit

Too bad Text::Match::FastAlternatives's return values aren't more useful i.e; the matched position. This with a /g equivalent and Bob's your uncle. ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] Q: giant-but-simple regex efficiency

2011-02-06 Thread Greg London

is faster. but but not tokenizing gives your grammers more flexibility I think. -Original message- From: Charlie creit...@rcn.com To: Ted Zlatanov t...@lifelogs.com Cc: boston...@pm.org Sent: Sun, Feb 6, 2011 16:49:56 GMT+00:00 Subject: Re: [Boston.pm] Q: giant-but-simple regex efficiency

Re: [Boston.pm] Q: giant-but-simple regex efficiency

2011-02-05 Thread Martyn Peck

hi Ok, I've been reading over the responses you've been getting and I just have to ask everyone. What's wrong with something like this: while($line=){ foreach my $name (@names){ $line ~= s/$name/prefix_$1/g; } } I know it seems kind of

Re: [Boston.pm] Q: giant-but-simple regex efficiency

2011-02-05 Thread Uri Guttman

MP == Martyn Peck m...@mwpnet.com writes: MP What's wrong with something like this: MP while($line=){ MP foreach my $name (@names){ MP $line ~= s/$name/prefix_$1/g; MP } MP } it is O( N^2 ) which is very slow for large data sets. MP I know it seems

Re: [Boston.pm] Q: giant-but-simple regex efficiency

2011-02-05 Thread Alex Vandiver

At Fri Feb 04 18:53:09 -0500 2011, Uri Guttman wrote: that will kill your cpu. alternations are very slow since they have to go back and try from the beginning of the list each time. Since we're talking about literals, this hasn't been true since 2007, with the release of perl 5.10. Perl now

Re: [Boston.pm] Q: giant-but-simple regex efficiency

2011-02-05 Thread Charlie

Short answer, no, Perl regex will not build an optimal lookup of a token into your set of 6000 names. In general, if speed is the issue, do not use regex. It does not scale. Also, be clear on the 2 problems at hand: 1) tokenizing 1GB of input text and 2) adding a prefix to identified

[Boston.pm] Q: giant-but-simple regex efficiency

2011-02-04 Thread Kripa Sundar

Hi folks, Problem: I have a 900 Meg text file, containing random text. I also have a list of 6000 names (alphanumeric strings) that occur in the random text. I need to tag a prefix on to each occurrence of each of these 6000 names. My premise: I believe a regex would give the simplest and most

Re: [Boston.pm] Q: giant-but-simple regex efficiency

2011-02-04 Thread Uri Guttman

KS == Kripa Sundar kripa.sun...@synopsys.com writes: KS I have a 900 Meg text file, containing random text. I also have a list KS of 6000 names (alphanumeric strings) that occur in the random text. KS I need to tag a prefix on to each occurrence of each of these 6000 KS names. KS My

Re: [Boston.pm] Q: giant-but-simple regex efficiency

2011-02-04 Thread Greg London

To: boston-pm@mail.pm.org boston-pm@mail.pm.org Sent: Sat, Feb 5, 2011 00:53:35 GMT+00:00 Subject: Re: [Boston.pm] Q: giant-but-simple regex efficiency Thanks for the prompt replies, folks! Unfortunately, my names can be embedded in larger words of the input text, as long as they are delimited by certain

Re: [Boston.pm] Q: giant-but-simple regex efficiency

Re: [Boston.pm] Q: giant-but-simple regex efficiency

Re: [Boston.pm] Q: giant-but-simple regex efficiency

Re: [Boston.pm] Q: giant-but-simple regex efficiency

Re: [Boston.pm] Q: giant-but-simple regex efficiency

Re: [Boston.pm] Q: giant-but-simple regex efficiency

Re: [Boston.pm] Q: giant-but-simple regex efficiency

Re: [Boston.pm] Q: giant-but-simple regex efficiency

Re: [Boston.pm] Q: giant-but-simple regex efficiency

[Boston.pm] Q: giant-but-simple regex efficiency

Re: [Boston.pm] Q: giant-but-simple regex efficiency

Re: [Boston.pm] Q: giant-but-simple regex efficiency

12 matches

Site Navigation

Mail list logo

Footer information