Dear Gurus, I have a file of the follwoing form
>FFM50HR02GMY4E length=75 xy=2604_3772 region=2 run=R_2008_08_19_08_32_31_ GGGGTCAATGGGTCCGACGGAGAAAGCGCGACAGAGGGGAAAGCCCTTTCCCCTCCCCGT TCGACTAGCGTCGTG >FFM50HR02F5QTS length=59 xy=2408_2686 region=2 run=R_2008_08_19_08_32_31_ AGGACATGCGGCCCGGCGACCTCATCATCTACTTCGACGACGCCAGCCACGTCGGGATG It has over 5000 such blocks, each starting with ">". I need to search for a given pattern (String of characters) in the second line of each block and then print the block header (>FFM50HR02F5QTS). I only need to parse the first 500 blocks of each file. Of these 500 blocks, I then need to output the number of times the pattern has occured. My code is below. I didn't think I has missed anythign till I manually went into each file to compare the results, which don't match. Can someone point me to whats going wrong here? #!/usr/bin/perl $file_to_parse = "/home/myfile"; $pattern = "CTTGGCGAGAAGGGCCGCTACCTGCTGGCCGCCTCCTTCGGCAACGT"; #$pattern = "abc"; $max_blocks = 500; # open the data file open (DAT, "$file_to_parse") || die ("Cannot open file: $file_to_parse"); $match_count = 0; $block_count = 0; $block = ""; while (<DAT>){ chomp (); #remove newline characters if ($_ =~ /^>/ && $. > 0){ #beginning of the next block reached #look for matches in the current block if ($block_count <= $max_blocks){ # check not more than $max_blocks $num_matches = () = $block =~ /$pattern/g; #how many matches in this block $match_count += $num_matches; #increase global match coutner $block =~ /^(>.+?)\s/g; #get block ID, e.g. >FIFKRKM06HCSVV $block_id = $1; if ($num_matches > 0){ #output information print "Block ID: $block_id\nBlock #: $block_count\nNumber of matches in this block: $num_matches\n\n"; } } $block = ""; #empty block holder variable $block_count++; #increase block count } #build the block, concatenate lines $block .= $_; } close DAT; print "Max number of blocks to search: $max_blocks\n"; print "Number of blocks found in this file: $block_count\n"; print "Total matches in $max_blocks blocks: $match_count\n\n"; # exit exit;