Hi Scott

Scott E Robinson wrote:
> Dear Perl experts,
>
> I'm trying to find the right regular expressions to do some simple (?)
> string processing.  Can anyone tell me how to do these?
>
> 1.  Given a string consisting of substrings delimited by colons, such
> as :B520:L201:M:M260:8:G607:,

    my $string = ':B520:L201:M:M260:8:G607:';

> how can I
>       a. remove the single-character and all-numeric substrings (M
> and 8 in this example), leaving the rest alone?

This is an array of all the wanted substrings:

    my @wanted = $string =~ /[a-z]\w+/ig;
    print "@wanted\n";

output

    B520 L201 M260 G607

>       b. remove all but the single-character and all-numeric
> substrings and leave the rest alone?

This is an array of all the wanted substrings:

    my @wanted = $string =~ /\b\w\b/ig;
    print "@wanted\n";

output

    M 8

>       c. remove all but single- or double-character substrings and
> all-numeric substrings and leave the rest alone?

This is an array of all the wanted substrings:

    my @wanted = $string =~ /\b[a-z]\w?\b/ig;
    print "@wanted\n";

output

    M

(Note that this won't find '8M', but your data doesn't look like
that will come up.)

>   The string will never have regex metacharacters in it, just mixed
> alpha and numeric, all-alpha, or all-numeric.

That makes things a lot easier. I'm also assuming they won't contain
underscores.

> The colons can stay.

I've removed them. If you want the array turned back into a
string like you started with, use

    my @wanted = $string =~ /[a-z]\w+/ig;
    print join ':', '', @wanted, '';

output

    :B520:L201:M260:G607:

> 2.  Is there an easy way to count the number of substrings (the
> "chunks" between colons)?

The length of one of the arrays is

    my $length = @wanted;

or, from one of the original strings

    my $string = ':B520:L201:M:M260:8:G607:';
    my $length = () = $string =~ /\w+/g;
    print $length;

output

    6

> 3.  This one is probably a bit difficult.  I don't need to have it,
> but it would save me lots of effort if I had it.  Given two strings
> of the same form as in no. 1, is there a regular expression which can
> compare the two and return the number of substring positions which
> have exact matches? I.e., given string 1 = :L000:W000:M:M260:G607:
> and string 2 = :L001:W000:M:M261:M260: can match the substrings and
> their positions and return the result "2" in this case?  The M260
> substring is present in both but in different positions and shouldn't
> be counted as a match.

    my $string1 = ':L000:W000:M:M260:G607:';
    my $string2 = ':L001:W000:M:M261:M260:';

    my $matches = do {
        my %pair;
        @pair{$string1 =~ /\w+/g} = $string2 =~ /\w+/g;
        grep $pair{$_} eq $_, keys %pair;
    };
    print $matches;

output

    2

HTH,

Rob




-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to