HI,
Me again with another parsing script :)
this time, I have this file format, see below, I want to assess if the genotyping data ( 2 letters at the beginning) are the same in column 1 or 2, NN should be ignored (as no results provided). I should count as well the number of heterozygous ( when the 2 letters are different) and homozygous ( when the 2 letters are identical). calculate homo_same (for homo: same 2 identical letters for both column, ex line 8) and homo_diff(for homo: different 2 identical letter for both column, ex line 4).
calculate het_same( ex line 1)and het_diff (ex line3).

AG;1.000       \t  AG;1.000\n
NN;0.775        NN;0.805
AC;0.999        TC;0.998
AA;0.998        TT;0.998
GG;0.997        GG;0.997
AG;1.000        AG;1.000
AG;0.979        NN;0.661
GG;0.996        GG;0.989

I have managed to calculate the proportion of each genotype for both column, but I am struggling to find a way to calculate homo_same and homo_diff, het_same and het_ diff

this is my script which should work with the example file above:
#!/software/bin/perl
use warnings;
use strict;

my
$file=".txt";

my @hets=qw{AT AC AG TA TC TG GA GC GT CA CT CG};
my @homos=qw{AA TT CC GG};

#my ($homo_count_same,$homo_count_diff);
#my ($het_count_same,$het_count_diff);
my %geno;

my $line;

open( my $FH , '<' , $file ) or die( $! );
open(OUT, ">>Test.txt");

while( <$FH> ) {
 my @Snps = split /\t/;
foreach my $Snp (@Snps) {
       $line++;
       if ($Snp =~/^NN/)
       { next; }
       foreach my $het (@hets){ #test if $snp is heterozygous
           if ($Snp=~/$het/){
print OUT $line,"\t",$Snp,"\t","heterozygous","\t",$het,"\n"; #print the genotype het $geno{$het}++;#count het in hashe
           }
        }
        foreach my $hom (@homos){#test if $snp is homo
           if ($Snp=~/$hom/){

print OUT $line,"\t",$Snp,"\t","homozygous","\t",$hom,"\n"; #print the genotype homo
           $geno{$hom}++;#count homo in hashe
           }

       }
}
#I guess I need to calculate my $homo_same and Homo_diff from here, but I am not sure how


}

foreach my $gen (keys %geno){
   print "$gen is present ",$geno{$gen}," times","\n";
}

many thanks for any suggestions,
Nat


--
The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.
--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/


Reply via email to