Wijaya Edward am Donnerstag, 27. April 2006 02.51: > Hi, > I have two strings that I want to compute the number of mismatches between > them. These two strings are of the "same" size. Let's call them 'source' > string and 'target' string. Now, the problem is that the 'source' and > 'target' string may come in ambiguous form, meaning that in one position > they may contain more than 1 (upto 4) characters. The ambiguous position is > marked with square bracketed [ATCG] region. The example is as follows: > > Example 1 (where the source is ambiguous): > > my $source1 = '[TCG]GGGG[AT]'; # ambiguous > my $target1 = 'AGGGGC'; # No of mismatch = 2 on position 1 and 6 > my $target2 = 'TGGGGC'; # No of mismatch = 1 on position 6 only > > > Example 2 (where the source is NOT ambiguous): > > my $source2 = 'TGGGGT'; # not-ambiguous > my $target1 = 'AGGGGC'; # No of mismatch = 2 on position 1 and 6 > my $target3 = 'TGGGGT'; # No of mismatch = 0 all position matches > > > Example 3 (where both source and target are ambiguous) > my $source1 = '[TCG]GGGG[AT]'; # ambiguous > my $target1 = 'AGGGG[CT]'; # ambiguous, no of mismatch = 1 only > at position 1 > > For example I can use bitwise operator to do it. > > I have no problem when dealing with Example 1 and 2 above. > But I'm stuck with example 3, where both source and target is ambiguous. > > > Here is the current snippet I have, which doesn't do the job: > > __BEGIN__ > sub mismatches { > my($source, $target) = @_; > my @sparts = ($source =~ /(\[.*?\]|.)/g); > my @tparts = ($target =~ /(\[.*?\]|.)/g); > > scalar grep $tparts[$_] !~ /^$sparts[$_]/, 0 .. $#sparts; > } > __END__ > > Where did I go wrong? I humbly seek advice.
Hello Edward Here is one way to do it. I didn't test it thorougly, but it demonstrates the alternative aproach of comparing every position in the source and target: #!/usr/bin/perl use strict; use warnings; sub mismatches { my ($source, $target)[EMAIL PROTECTED]; # split source and target into single positions # my @spos=$source=~/((?:\[.+?\])|.)/g; my @tpos=$target=~/((?:\[.+?\])|.)/g; # debug info # warn "source positions: ", (join ',', @spos), "\n"; warn "target positions: ", (join ',', @tpos), "\n"; my $mm=0; # number of mismatches my @mmp; # mismatch positions # calculate number of mismatches and their positions # do { $mm++, push @mmp,$_ if $spos[$_]!~qr/$tpos[$_]/ } for 0..$#spos; # debug info # warn "$mm mismatch(es) at positions @mmp;\n"; $mm; } mismatches('[TCG]GGGG[AT]', 'AGGGGC'); mismatches('[TCG]GGGG[AT]', 'TGGGGC'); mismatches('[TCG]GGGG[AT]', 'AGGGG[CT]'); mismatches('[TCG]GG[CT]G[AT]', 'AGGGG[CT]'); mismatches('[TCG]GG[CT]G[AT]', 'AGGG[AC][CT]'); __END__ source positions: [TCG],G,G,G,G,[AT] target positions: A,G,G,G,G,C 2 mismatch(es) at positions 0 5; source positions: [TCG],G,G,G,G,[AT] target positions: T,G,G,G,G,C 1 mismatch(es) at positions 5; source positions: [TCG],G,G,G,G,[AT] target positions: A,G,G,G,G,[CT] 1 mismatch(es) at positions 0; source positions: [TCG],G,G,[CT],G,[AT] target positions: A,G,G,G,G,[CT] 2 mismatch(es) at positions 0 3; source positions: [TCG],G,G,[CT],G,[AT] target positions: A,G,G,G,[AC],[CT] 3 mismatch(es) at positions 0 3 4; hope this helps Dani -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>