On Mon, 17 Sep 2012 18:29:34 +0000
"Wang, Li" <li.w...@ttu.edu> wrote:

> Dear List members
> 
> I have three columns of a table.  The example is as follows:
> 
> DBS     R^2     genename
> 801     0.27807486057281494     POPTR_0002s00200
> 1903    1.0     POPTR_0002s00200
> 1103    0.25852271914482117     POPTR_0002s00200
> 3215    0.03134157508611679     POPTR_0002s00200
> 2415    0.010018552653491497    POPTR_0002s00200
> 1313    0.03134157508611679     POPTR_0002s00200
> 3442    1.0     POPTR_0002s00200
> 2642    0.25852271914482117     POPTR_0002s00200
> 1540    1.0     POPTR_0002s00200
> 228     0.03134157508611679     POPTR_0002s00200
> 3099    0.026160990819334984    POPTR_0002s00210
> 7555    0.800000011920929       POPTR_0002s00210
> 4457    0.014814814552664757    POPTR_0002s00210
> 7564    5.232862313278019E-4    POPTR_0002s00210
> 4466    0.0018315018387511373   POPTR_0002s00210
> 10      0.0036630036775022745   POPTR_0002s00210
> 7565    5.232862313278019E-4    POPTR_0002s00210
> 4467    0.0018315018387511373   POPTR_0002s00210
> 11      0.0036630036775022745   POPTR_0002s00210
> 2       1.0     POPTR_0002s00210
> 
> I would like to calculate the average value of column 2 while the
> content of column three is the same. In this case, I would like the
> output of my result be as follows: R^2     genename 0.3899163577
> POPTR_0002s00200 0.2314956035    POPTR_0002s00210
> 
> I donot know how to deal with columns in Perl. I thought about using
> the idea of hash. But the key of a hash could not be the same.

You'd probably want to use a hash where the key is the genename in the
third column of your input, and the value is an arrayref of each value
you saw - so you can collate values against genenames, then calculate
the average at the end.

Something along the lines of:

    use strict;
    use List::Util;

    # For each row, record the value against the genename in question:
    my %values_by_genename;
    while(my $line = <>) {
        chomp $line;
        next if $genename eq 'genename'; # skip header row
        my ($dbs, $r2, $genename) = split /\s+/, $line;
        push @{ $values_by_genename{$genename} }, $r2;
    }

    # Now, for each genename, calculate the average value
    for my $genename (keys %values_by_genename) {
        my $avg = List::Util::sum(
            @{ $values_by_genename{$genename} }
        ) / scalar @{ $values_by_genename{$genename} };
        print "$avg,$genename\n";
    }

Of course, you'd be better off parsing the input using Text::CSV, but
the above should give you something to start from.





-- 
David Precious ("bigpresh") <dav...@preshweb.co.uk>
http://www.preshweb.co.uk/     www.preshweb.co.uk/twitter
www.preshweb.co.uk/linkedin    www.preshweb.co.uk/facebook
www.preshweb.co.uk/cpan        www.preshweb.co.uk/github



-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/


Reply via email to