Wijaya Edward am Donnerstag, 27. April 2006 02.51:
> Hi,
> I have two strings that I want to compute the number of mismatches between
> them. These two strings are of the "same" size. Let's call them 'source'
> string and 'target' string. Now, the problem is that the 'source' and
> 'target' string may come in ambiguous form, meaning that in one position
> they may contain more than 1 (upto 4) characters. The ambiguous position is
> marked with square bracketed [ATCG] region. The example is as follows:
>
> Example 1 (where the source is ambiguous):
>
> my $source1  = '[TCG]GGGG[AT]'; # ambiguous
> my $target1   = 'AGGGGC'; # No of mismatch = 2  on position 1 and 6
> my $target2  = 'TGGGGC'; # No of mismatch = 1  on position 6 only
>
>
> Example 2 (where the source is NOT ambiguous):
>
> my $source2  =  'TGGGGT'; # not-ambiguous
> my $target1  = 'AGGGGC'; # No of mismatch = 2  on position 1 and 6
> my $target3  = 'TGGGGT'; # No of mismatch = 0  all position matches
>
>
> Example 3 (where both source and target are ambiguous)
> my $source1  = '[TCG]GGGG[AT]'; # ambiguous
> my $target1   = 'AGGGG[CT]';         # ambiguous, no of mismatch = 1  only
> at position 1
>
> For example I can use bitwise operator to do it.
>
> I have no problem when dealing with Example 1 and 2 above.
> But I'm stuck with example 3, where both source and target is ambiguous.
>
>
> Here is the current snippet I have, which doesn't do the job:
>
> __BEGIN__
> sub mismatches {
>     my($source, $target) = @_;
>     my @sparts = ($source =~ /(\[.*?\]|.)/g);
>     my @tparts = ($target =~ /(\[.*?\]|.)/g);
>
>     scalar grep $tparts[$_] !~ /^$sparts[$_]/, 0 .. $#sparts;
>   }
> __END__
>
> Where did I go wrong? I humbly seek advice.

Hello Edward

Here is one way to do it. 
I didn't test it thorougly, but it demonstrates the alternative aproach of 
comparing every position in the source and target:

#!/usr/bin/perl
use strict;
use warnings;

sub mismatches {
        my ($source, $target)[EMAIL PROTECTED];

        # split source and target into single positions
        #
        my @spos=$source=~/((?:\[.+?\])|.)/g;
        my @tpos=$target=~/((?:\[.+?\])|.)/g;

        # debug info
        #
        warn "source positions: ", (join ',', @spos), "\n";
        warn "target positions: ", (join ',', @tpos), "\n";

        my $mm=0; # number of mismatches
        my @mmp; # mismatch positions

        # calculate number of mismatches and their positions
        #
        do { $mm++, push @mmp,$_ if $spos[$_]!~qr/$tpos[$_]/ } for 0..$#spos;

        # debug info
        #
        warn "$mm mismatch(es) at positions @mmp;\n";

        $mm;
}

mismatches('[TCG]GGGG[AT]', 'AGGGGC');
mismatches('[TCG]GGGG[AT]', 'TGGGGC');
mismatches('[TCG]GGGG[AT]', 'AGGGG[CT]');
mismatches('[TCG]GG[CT]G[AT]', 'AGGGG[CT]');
mismatches('[TCG]GG[CT]G[AT]', 'AGGG[AC][CT]');

__END__

source positions: [TCG],G,G,G,G,[AT]
target positions: A,G,G,G,G,C
2 mismatch(es) at positions 0 5;
source positions: [TCG],G,G,G,G,[AT]
target positions: T,G,G,G,G,C
1 mismatch(es) at positions 5;
source positions: [TCG],G,G,G,G,[AT]
target positions: A,G,G,G,G,[CT]
1 mismatch(es) at positions 0;
source positions: [TCG],G,G,[CT],G,[AT]
target positions: A,G,G,G,G,[CT]
2 mismatch(es) at positions 0 3;
source positions: [TCG],G,G,[CT],G,[AT]
target positions: A,G,G,G,[AC],[CT]
3 mismatch(es) at positions 0 3 4;


hope this helps

Dani

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>


Reply via email to