At 6:05 pm -0200 17/1/05, you wrote:
Thanks Andrew for your input! But the script still gives me the result for the total number of times they appear in the text. What I need now is to get the results for individual blocks, something like this:
input fileSequence Contig3772 Assembled_from CR05-C1-102-004-_A01_-CT.F_008.ab1 -40 955 Assembled_from CR05-C1-102-006-_E05_-CT.F_035.ab1 -40 972 Assembled_from CR05-C1-102-004-_B01_-CT.F_007.ab1 -32 1007 Assembled_from CR05-C1-103-033-_G08_-CT.F_026.ab1 397 1400 Assembled_from CR05-C1-102-060-_D07_-CT.F_029.ab1 403 1450 Assembled_from CR05-C1-102-008-_G03_-CT.F_010.ab1 404 1427 Assembled_from CR05-C1-102-065-_F12_-CT.F_043.ab1 406 1498
Sequence Contig3773 Assembled_from CR05-C1-103-041-_E11_-CT.F_044.ab1 -694 275 Assembled_from CR05-C1-102-019-_A11_-CT.F_048.ab1 -626 289 Assembled_from CR05-C1-102-019-_D03_-CT.F_013.ab1 -625 314 Assembled_from CR05-C1-102-019-_B11_-CT.F_047.ab1 -733 185
Apologies first of all for my original useless response. Here's how I would do it -- and it works.
while (<>) { /Contig([0-9]+)/i and $hash=$1 and eval "my \%$hash"; /CR05-C1-102|CR05-C1-103/i and eval "\$$hash\{\$&\} += 1"; }
Every time a ...Contignnnn line is encountered a new hash is created. When a -102- match is found $hash{-102-} is incremented etc.
Using the above contents for your ("\n" delimited) file, you can run the script and then test the results, as below. How you decide to name the keys etc. is up to you.
## TEST print qq~$3772{'CR05-C1-102'} $3772{'CR05-C1-103'}~; # Result: 6 1
JD