parsing file need help please

Nathalie Conte Wed, 08 Jun 2011 07:49:20 -0700

HI,
Me again with another parsing script :)

this time, I have this file format, see below, I want to assess if thegenotyping data ( 2 letters at the beginning) are the same in column 1or 2, NN should be ignored (as no results provided).I should count as well the number of heterozygous ( when the 2 lettersare different) and homozygous ( when the 2 letters are identical).calculate homo_same (for homo: same 2 identical letters for both column,ex line 8) and homo_diff(for homo: different 2 identical letter for bothcolumn, ex line 4).

calculate het_same( ex line 1)and het_diff (ex line3).


AG;1.000       \t  AG;1.000\n
NN;0.775        NN;0.805
AC;0.999        TC;0.998
AA;0.998        TT;0.998
GG;0.997        GG;0.997
AG;1.000        AG;1.000
AG;0.979        NN;0.661
GG;0.996        GG;0.989

I have managed to calculate the proportion of each genotype for bothcolumn, but I am struggling to find a way to calculate homo_same andhomo_diff, het_same and het_ diff


this is my script which should work with the example file above:
#!/software/bin/perl
use warnings;
use strict;

my
$file=".txt";

my @hets=qw{AT AC AG TA TC TG GA GC GT CA CT CG};
my @homos=qw{AA TT CC GG};

#my ($homo_count_same,$homo_count_diff);
#my ($het_count_same,$het_count_diff);
my %geno;

my $line;

open( my $FH , '<' , $file ) or die( $! );
open(OUT, ">>Test.txt");

while( <$FH> ) {
 my @Snps = split /\t/;
foreach my $Snp (@Snps) {
       $line++;
       if ($Snp =~/^NN/)
       { next; }
       foreach my $het (@hets){ #test if $snp is heterozygous
           if ($Snp=~/$het/){

print OUT$line,"\t",$Snp,"\t","heterozygous","\t",$het,"\n"; #print the genotype het$geno{$het}++;#count het in hashe

           }
        }
        foreach my $hom (@homos){#test if $snp is homo
           if ($Snp=~/$hom/){

print OUT $line,"\t",$Snp,"\t","homozygous","\t",$hom,"\n";#print the genotype homo

           $geno{$hom}++;#count homo in hashe
           }

       }
}

#I guess I need to calculate my $homo_same and Homo_diff from here, butI am not sure how



}

foreach my $gen (keys %geno){
   print "$gen is present ",$geno{$gen}," times","\n";
}

many thanks for any suggestions,
Nat


--

The Wellcome Trust Sanger Institute is operated by Genome ResearchLimited, a charity registered in England with number 1021457 and acompany registered in England with number 2742969, whose registeredoffice is 215 Euston Road, London, NW1 2BE.

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

parsing file need help please

Reply via email to