Thanks Andrew for your input!
But the script still gives me the result for the total number of times they appear in the text. What I need now is to get the results for individual blocks, something like this:
input file
Sequence Contig3772 Assembled_from CR05-C1-102-004-_A01_-CT.F_008.ab1 -40 955 Assembled_from CR05-C1-102-006-_E05_-CT.F_035.ab1 -40 972 Assembled_from CR05-C1-102-004-_B01_-CT.F_007.ab1 -32 1007 Assembled_from CR05-C1-103-033-_G08_-CT.F_026.ab1 397 1400 Assembled_from CR05-C1-102-060-_D07_-CT.F_029.ab1 403 1450 Assembled_from CR05-C1-102-008-_G03_-CT.F_010.ab1 404 1427 Assembled_from CR05-C1-102-065-_F12_-CT.F_043.ab1 406 1498
Sequence Contig3773 Assembled_from CR05-C1-103-041-_E11_-CT.F_044.ab1 -694 275 Assembled_from CR05-C1-102-019-_A11_-CT.F_048.ab1 -626 289 Assembled_from CR05-C1-102-019-_D03_-CT.F_013.ab1 -625 314 Assembled_from CR05-C1-102-019-_B11_-CT.F_047.ab1 -733 185
output:
Contig 3772
CR05-C1-102 6 CR05-C1-103 1
Contig 3773
CR05-C1-102 3 CR05-C1-103 1
I believe that it is not very complicated to do that but it is just that I'm able to do that by myself...
Marco Takita
On Jan 17, 2005, at 5:34 PM, Andrew Mace wrote:
Why not something like:
my %sequences = (); my $seq;
while(<>) { if($_ =~ m/^Sequence ([^\n]+)$/) { $seq = $1; $sequences{$1} = [0,0]; } elsif($_ =~ m/CR05-C1-10(\d)/) { if($1 == 2) { $sequences{$seq}->[0]++; } elsif($1 == 3) { $sequences{$seq}->[1]++; } } }
my $total_102 = 0;
my $total_103 = 0;
for(keys %sequences) {
print $_, ": 102 = ", $sequences{$_}->[0], "; 103 = ", $sequences{$_}->[1], "\n";
$total_102 += $sequences{$_}->[0];
$total_103 += $sequences{$_}->[1];
}
print "Total 102 = ", $total_102, "\n"; print "Total 103 = ", $total_103, "\n";
Andrew
On Jan 17, 2005, at 2:04 PM, Marco Takita wrote:
Hi guys, sorry for the question not directly related to macosx but this is the OS I work with and I know that you guys are really helpful.
I'm really new to perl. Actually I'm trying write my very first script. Let me try to explain what I need. I have a large text file that is basically something like this:
Sequence Contig3772 Assembled_from CR05-C1-102-004-_A01_-CT.F_008.ab1 -40 955 Assembled_from CR05-C1-102-006-_E05_-CT.F_035.ab1 -40 972 Assembled_from CR05-C1-102-004-_B01_-CT.F_007.ab1 -32 1007 Assembled_from CR05-C1-103-033-_G08_-CT.F_026.ab1 397 1400 Assembled_from CR05-C1-102-060-_D07_-CT.F_029.ab1 403 1450 Assembled_from CR05-C1-102-008-_G03_-CT.F_010.ab1 404 1427 Assembled_from CR05-C1-102-065-_F12_-CT.F_043.ab1 406 1498
Sequence Contig3773 Assembled_from CR05-C1-103-041-_E11_-CT.F_044.ab1 -694 275 Assembled_from CR05-C1-102-019-_A11_-CT.F_048.ab1 -626 289 Assembled_from CR05-C1-102-019-_D03_-CT.F_013.ab1 -625 314 Assembled_from CR05-C1-102-019-_B11_-CT.F_047.ab1 -733 185
Sequence Contig3774
and so on.
What I need is to count how many times either CR05-C1-102 or CR05-C1-103 appears in the text, which I was able to do:
#!/usr/bin/perl
while (<>) {
chomp;
@text = (CR05-C1-102,CR05-C1-103);
foreach $wd (split) {
if ($wd =~ @text[0], @text[1]){ if ($wd =~ @text[0]){ $score++; } if ($wd =~ @text[1]){ $res++; } } } }
print " CR05-C1-102 $score CR05-C1-103 $res \n\n";
My problem is that I cannot do that for individual blocks like:
Sequence Contig3772 Assembled_from CR05-C1-102-004-_A01_-CT.F_008.ab1 -40 955 Assembled_from CR05-C1-102-006-_E05_-CT.F_035.ab1 -40 972 Assembled_from CR05-C1-102-004-_B01_-CT.F_007.ab1 -32 1007 Assembled_from CR05-C1-103-033-_G08_-CT.F_026.ab1 397 1400 Assembled_from CR05-C1-102-060-_D07_-CT.F_029.ab1 403 1450 Assembled_from CR05-C1-102-008-_G03_-CT.F_010.ab1 404 1427 Assembled_from CR05-C1-102-065-_F12_-CT.F_043.ab1 406 1498
I was not able to isolate this block from the rest of the text.
Any idea how to do that?
Thanks a lot
Dr. Marco Aurélio Takita, Ph.D. Centro APTA Citros Sylvio Moreira Rodovia Anhanguera Km 158 Caixa Postal 04 13490-970 Cordeirópolis - SP, BRAZIL Tel.: 55-19-35461399