At 6:05 pm -0200 17/1/05, you wrote:

Thanks Andrew for your input! But the script still gives me the result for the total number of times they appear in the text. What I need now is to get the results for individual blocks, something like this:

input file
Sequence Contig3772
Assembled_from  CR05-C1-102-004-_A01_-CT.F_008.ab1  -40  955
Assembled_from  CR05-C1-102-006-_E05_-CT.F_035.ab1  -40  972
Assembled_from  CR05-C1-102-004-_B01_-CT.F_007.ab1  -32  1007
Assembled_from  CR05-C1-103-033-_G08_-CT.F_026.ab1  397  1400
Assembled_from  CR05-C1-102-060-_D07_-CT.F_029.ab1  403  1450
Assembled_from  CR05-C1-102-008-_G03_-CT.F_010.ab1  404  1427
Assembled_from  CR05-C1-102-065-_F12_-CT.F_043.ab1  406  1498


Sequence Contig3773 Assembled_from CR05-C1-103-041-_E11_-CT.F_044.ab1 -694 275 Assembled_from CR05-C1-102-019-_A11_-CT.F_048.ab1 -626 289 Assembled_from CR05-C1-102-019-_D03_-CT.F_013.ab1 -625 314 Assembled_from CR05-C1-102-019-_B11_-CT.F_047.ab1 -733 185


Apologies first of all for my original useless response. Here's how I would do it -- and it works.

        while (<>) {
          /Contig([0-9]+)/i and $hash=$1 and eval "my \%$hash";
          /CR05-C1-102|CR05-C1-103/i and eval "\$$hash\{\$&\} += 1";
        }

Every time a ...Contignnnn line is encountered a new hash is created. When a -102- match is found $hash{-102-} is incremented etc.

Using the above contents for your ("\n" delimited) file, you can run the script and then test the results, as below. How you decide to name the keys etc. is up to you.

## TEST
print  qq~$3772{'CR05-C1-102'} $3772{'CR05-C1-103'}~;
# Result: 6 1

JD









Reply via email to