Dear John,

Thanks so much for your life saving response.
There are one minor issue I still couldn't solve.

It is the fact that when the bounded region marked
by the array may occur more than once.
(See example no. 7 and 8 in my code below)

To disambiguate the situation, I can give the array that
comes along with the index.

I tried to modify your code below to handle
the matters. But I still cannot solve it.
I think I'm almost there but not quite yet.

Can you advice, how can I go about it?
Thanks so much beforehand. Really hope to hear
from you again.

__BEGIN__
my $t1 ='CCCATCTGTCCTTATTTGCTG';  my @ar1 = qw(ATCTG-3 ATTTG-13);
my $t2 ='ACCCATCTGTCCTTGGCCAT';   my @ar2 = qw(CCATC-2);
my $t3 ='CCACCAGCACCTGTC';        my @ar3 = qw(CCACC-0 CCAGC-3 GCACC-6);
my $t4 ='CCCAACACCTGCTGCCT';      my @ar4 = qw(CCAAC-1 ACACC-4);
my $t5 ='CTGGGTATGGGT';           my @ar5 = qw(GTATG-4 TGGGT-1);
my $t6 = 'AGGAACTTGCCTGTACCACAGGAAG'; my @ar6 = qw( CAGGA-18 AGGAA-19 );

#The above example should yield the same result as previously

# These two examples below are the 'ambiguous' cases.

my $t7 = 'CAGGACTTGCCTGTACCACAGGAAG'; my @ar7 = qw( CAGGA-18  );
my $t8 = 'CAGGATTTGAGGAAGTACCACAGGAAG'; my @ar8 = qw( CAGGA-18 AGGAA-19 );

# Answer 7 -- CAGGACTTGCCTGTACCA[CAGGA]AG Instead of -- [CAGGA]CTTGCCTGTACCACAGGAAG # Answer 8 -- CAGGATTTGAGGAAGTACCA[CAGGAA]G Instead of -- [CAGGA]TTTG[AGGAA]GTACCACAGGAAG


print put_bracket_jk_idx($t8,[EMAIL PROTECTED]),"\n";

sub put_bracket_jk_idx {
    my ( $str, $ar ) = @_;

    for my $subs ( @$ar ) {

         my ($sb,$id) = split("-",$subs);
         print "$sb $id\n";

        if ( substr( $str, $id ) =~ /$subs/i ) {
            $id += $-[ 0 ];
            substr( $str, $id, length $subs ) =~ tr/A-Z/a-z/;
            }
        }
    $str =~ s/([a-z]+)/[\U$1\E]/g;

    return $str;
    }


print "\n";

__END__


--
Regards,
Edward WIJAYA
SINGAPORE

On Fri, 28 Oct 2005 18:42:11 +0800, John W. Krahn <[EMAIL PROTECTED]> wrote:

Wijaya Edward wrote:

I have the following problem.
I am trying to put the bracket in a string given the set of its substrings.
Those bracketed region is "bounded" by the given substrings.
Like this,  given input "String" and it's "substrings"

    String
1.CCCATCTGTCCTTATTTGCTG
2.ACCCATCTGTCCTTGGCCAT
3.CCACCAGCACCTGTC
4.CCCAACACCTGCTGCCT
5.CTGGGTATGGGT
6.AGGAACTTGCCTGTACCACAGGAAG

 Substrings:
 1. ATCTG ATTTG
 2. CCATC
 3. CCACC CCAGC GCAAC
 4. CCAAC ACACC
 5. GTATG TGGGT
 6. CAGGA AGGAA

The desired answer are:
1. CCC[ATCTG]TCCTT[ATTTG]CTG
2. AC[CCATC]TGTCCTTGGCCAT
3. [CCACCAGCACC]TGTC   *
4. C[CCAACACC]TGCTGCCT *
5. CTGG[GTATGGGT]      **
6. AGGAACTTGCCTGTACCA[CAGGAA]G **

Please note that  in example 3 and 4 the substrings are "overlapping".
Pay attention also to for example 5 and 6, there exist substrings that occur
twice. So the answer for example 5 and 6 are NOT

5. C[TGGGTATGGGT]   ----this is wrong
6. [AGGAA]CTTGCCTGTACCA[CAGGAA]G  ----this is wrong
--
Regards,
Edward WIJAYA
SINGAPORE>


--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>


Reply via email to