HI,
Me again with another parsing script :)
this time, I have this file format, see below, I want to assess if the
genotyping data ( 2 letters at the beginning) are the same in column 1
or 2, NN should be ignored (as no results provided).
I should count as well the number of heterozygous ( when the 2 letters
are different) and homozygous ( when the 2 letters are identical).
calculate homo_same (for homo: same 2 identical letters for both column,
ex line 8) and homo_diff(for homo: different 2 identical letter for both
column, ex line 4).
calculate het_same( ex line 1)and het_diff (ex line3).
AG;1.000 \t AG;1.000\n
NN;0.775 NN;0.805
AC;0.999 TC;0.998
AA;0.998 TT;0.998
GG;0.997 GG;0.997
AG;1.000 AG;1.000
AG;0.979 NN;0.661
GG;0.996 GG;0.989
I have managed to calculate the proportion of each genotype for both
column, but I am struggling to find a way to calculate homo_same and
homo_diff, het_same and het_ diff
this is my script which should work with the example file above:
#!/software/bin/perl
use warnings;
use strict;
my
$file=".txt";
my @hets=qw{AT AC AG TA TC TG GA GC GT CA CT CG};
my @homos=qw{AA TT CC GG};
#my ($homo_count_same,$homo_count_diff);
#my ($het_count_same,$het_count_diff);
my %geno;
my $line;
open( my $FH , '<' , $file ) or die( $! );
open(OUT, ">>Test.txt");
while( <$FH> ) {
my @Snps = split /\t/;
foreach my $Snp (@Snps) {
$line++;
if ($Snp =~/^NN/)
{ next; }
foreach my $het (@hets){ #test if $snp is heterozygous
if ($Snp=~/$het/){
print OUT
$line,"\t",$Snp,"\t","heterozygous","\t",$het,"\n"; #print the genotype het
$geno{$het}++;#count het in hashe
}
}
foreach my $hom (@homos){#test if $snp is homo
if ($Snp=~/$hom/){
print OUT $line,"\t",$Snp,"\t","homozygous","\t",$hom,"\n";
#print the genotype homo
$geno{$hom}++;#count homo in hashe
}
}
}
#I guess I need to calculate my $homo_same and Homo_diff from here, but
I am not sure how
}
foreach my $gen (keys %geno){
print "$gen is present ",$geno{$gen}," times","\n";
}
many thanks for any suggestions,
Nat
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/