Hi Scott Scott E Robinson wrote: > Dear Perl experts, > > I'm trying to find the right regular expressions to do some simple (?) > string processing. Can anyone tell me how to do these? > > 1. Given a string consisting of substrings delimited by colons, such > as :B520:L201:M:M260:8:G607:,
my $string = ':B520:L201:M:M260:8:G607:'; > how can I > a. remove the single-character and all-numeric substrings (M > and 8 in this example), leaving the rest alone? This is an array of all the wanted substrings: my @wanted = $string =~ /[a-z]\w+/ig; print "@wanted\n"; output B520 L201 M260 G607 > b. remove all but the single-character and all-numeric > substrings and leave the rest alone? This is an array of all the wanted substrings: my @wanted = $string =~ /\b\w\b/ig; print "@wanted\n"; output M 8 > c. remove all but single- or double-character substrings and > all-numeric substrings and leave the rest alone? This is an array of all the wanted substrings: my @wanted = $string =~ /\b[a-z]\w?\b/ig; print "@wanted\n"; output M (Note that this won't find '8M', but your data doesn't look like that will come up.) > The string will never have regex metacharacters in it, just mixed > alpha and numeric, all-alpha, or all-numeric. That makes things a lot easier. I'm also assuming they won't contain underscores. > The colons can stay. I've removed them. If you want the array turned back into a string like you started with, use my @wanted = $string =~ /[a-z]\w+/ig; print join ':', '', @wanted, ''; output :B520:L201:M260:G607: > 2. Is there an easy way to count the number of substrings (the > "chunks" between colons)? The length of one of the arrays is my $length = @wanted; or, from one of the original strings my $string = ':B520:L201:M:M260:8:G607:'; my $length = () = $string =~ /\w+/g; print $length; output 6 > 3. This one is probably a bit difficult. I don't need to have it, > but it would save me lots of effort if I had it. Given two strings > of the same form as in no. 1, is there a regular expression which can > compare the two and return the number of substring positions which > have exact matches? I.e., given string 1 = :L000:W000:M:M260:G607: > and string 2 = :L001:W000:M:M261:M260: can match the substrings and > their positions and return the result "2" in this case? The M260 > substring is present in both but in different positions and shouldn't > be counted as a match. my $string1 = ':L000:W000:M:M260:G607:'; my $string2 = ':L001:W000:M:M261:M260:'; my $matches = do { my %pair; @pair{$string1 =~ /\w+/g} = $string2 =~ /\w+/g; grep $pair{$_} eq $_, keys %pair; }; print $matches; output 2 HTH, Rob -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]